This article originally appeared on archiware.com. Reproduced here with permission
An Archive is many things to many people. From reference to re-purposing, from monetization to service for clients, there are many perspectives on what an Archive can be and do. Since the perspectives vary so much, a closer look can reveal the potential hidden in this ancient term.
Archivum was the name the Romans gave buildings that stored scripture rolls which were no longer needed for daily affairs. Although we no longer use scripture rolls, the mechanism hasn´t changed much. The files we put in an Archive today are those that are no longer needed for daily production. This means the Archive is actually a data migration. Files are moved to the Archive and deleted from the source. This is already the main distinction from a Backup which duplicates what is still being worked on. There can be exceptions to this rule, but more on that later.
Start by planning for the long-term
Since the Archive is a long-term project, you need to carefully consider who is involved now and in the future. A set of questions helps to discover all relevant actors and stakeholders.
All parties and perspectives need to be gathered and their input documented. The more perspectives can be gathered from people involved, the better the support for the Archive project will be. The usefulness improves also with multiple perspectives. Think ahead and speculate about future tasks, changes in the workflow and new people entering the company. What needs to be put in the Archive now in terms of metadata, what needs to be documented so that it will be useful for the future?
The most efficient way to discover important factors for a system that is not yet in place is to write use cases. Use cases are stories where you describe in detail who is doing what with a system to fulfill what task.
Find all those involved in running and using the Archive later. Who are the stakeholders? What are the preconditions?
A use case could look like this:
A person (describe role or position) is signed a task where he needs (describe assets, files). He has the following information available (describe). He turns to the Archive to search for files that fit the assigned task. He searches for (describe). He browses the entries in the Archive catalog and decides which ones to restore (describe how). He/someone else triggers the restore process. Some time later the restored files are restored to (where) and used by (whom).
Try to come up with several of these scenarios and complete them with details. It makes sense to sit together with your colleagues to think about future scenarios and details. This will give you plenty of insight into what you need to install in terms of metadata and workflow to create an Archive that serves the company best.
Helpful hints how to write a use case can be found here:
Looking at the history of storage media, there is an inverted relation of storage media density and their lifespan. While stone (engraving) retains information for very long, its information density is very low. Hard disk storage on the other end of the spectrum has an incredible storage density but a very short lifespan and depends on other technology to be available to read. LTO tape has a lifespan of several decades, but compared to historical dimensions this is still short. Therefore, migration is an integral part of any digital Archive. LTO makes this easy since each new generation is capable of reading media from two previous generations (e.g. LTO-7 can read LTO-5 tapes). There is a re-purchase guarantee of 10 years that creates a great deal of flexibility for migrations. Right now and for the foreseeable future, LTO tape seems to be the only and most proven Archive medium. No other storage technology today can rival its density, durability, read/write performance and price per TB. Also, the sheer volume of the market and its global use in big industries like finance, insurance, broadcast, science, etc. make it very likely to continue in its role. It remains to be seen if cold disk storage or other technologies develop enough to attract relevant market share.
A recent report shows how critical the situation is, especially for analog media like video, film and audio that needs to be digitized and archived to survive and be accessible.
As of now, the main decision for Archive storage remains between disk and tape storage. Depending on the amount of data, the usage pattern, security requirements and available budget, both technologies have their respective benefits.
Metadata – the key to unlock the Archive
Metadata plays a crucial part since it is the key to the Archive. Years after files have been archived, nobody might remember any specifics such as file names.
So the only option is to search for the right keyword, description or parameter – or in other words: metadata. There are two kinds of metadata: descriptive and technical.
Descriptive metadata needs human input like who is visible in a scene, what product was filmed, what location was used etc.
Technical metadata can be the type of camera used, lens, resolution, codec etc. Automatic generation of metadata (in cameras or during ingest) is advancing fast, one example being the shot detection in FCP X. These types of automatically created metadata can be very helpful. Additional descriptive metadata is – in most cases – a must.
In some cases, a third kind of metadata might be needed, such as administrative metadata describing rights for use and distribution, for example.
A metadata schema is the set of technical and descriptive metadata that is used for an Archive and enables fast retrieval of files. It needs careful consideration and planning for the future. The combination of terms might be unique because it has to serve the requirements of your workflow or organization. The use cases that you put together should point towards the necessary items that need to be included in your specific schema.
One important aspect of metadata is consistency. Consistent tagging of archived files adds tremendous value to the Archive. Ideally, anyone involved in searching assets later should be able to easily find and restore them. P5 Archive offers extensible metadata fields and dropdown menus to put such a metadata schema to work.
Once you know the who, what and how, you can start looking at the details of your Archive implementation.
There are three archive modes. Most people use manual archiving. A person (usually the admin) decides when to archive what and triggers the job.
Alternatively, there is automatic or watch folder archiving. A watch folder is combined with a schedule and data is picked up accordingly. Depending on the workflow and load, that schedule can be anything from several times a day to once a month.
Both modes can be combined with a filter. The filter is effectively a set of rules. Criteria are specified to filter files either for inclusion into the archive job or for exclusion (like temp, word, render files, mp3s etc.).
P5 Archive offers many options to configure an Archive that meets specific needs. There are options for access, display, storage and many more.
A number of features help to configure archive storage. Disk and tape storage can be used independent of each other. The storage pool is the organizational entity for the actual storage volumes which in turn are members of a pool. The illustrations below show where to find which feature.
Considerations for the configuration
Multiple mechanisms allow separation of access for groups of users:
- Separate login areas so that each group of users can only see their respective files.
- Separate archive indexes so that each group of users can only search their own catalogue or index (and never see the catalogue content of the other groups). This comes with the trade-off of having to search multiple indexes if the origin of a file is not known.
- Separate media pools: one media pool for each of the categories (e.g. departments) to keep files physically separated.
An archive plan is required for each different setup, since it defines the index and pool being used.
For each new archive database, at least one corresponding login area has to be configured (Access to Indexes -> New…)
The storage manager helps to perform storage tasks and build efficient views of the different libraries.
The Archive Plan
The archive plan is the core instrument to configure your Archive(s). It connects to the storage pool, archive index and filters (if used). This is also where the user groups who have access are specified. There are options for volume use, scanning, verify and deleting after archiving. The important features for preview generation and metadata import are also part of the archive plan.
The Archive in Color
P5 Archive offers the benefit of customizable thumbnails and proxies. Depending on your point of view and the use of the Archive, this can appear as unnecessary or absolutely vital. Archives of financial, business or scientific data might not need any form of preview. In media production, video and broadcast, on the other hand, this can be the most important key to the Archive because it is the way that material is identified.
Either way, the decision needs to be taken at the very beginning of the implementation and influences the size of the archive index considerably.
Disk and Tape Scalability and Price
LTO Tape has the advantage of scaling very easily, since new empty tapes can simply be added to a library. There are also expandable libraries on offer, making it easy to grow. Speed and simultaneous operations can be achieved by adding drives. P5 supports drive parallelization to almost multiply the throughput. Expanding disk RAID storage is more expensive, as new RAID systems need to be put in place. When reaching certain limits, infrastructure needs to be expanded and that can add tremendous effort and cost.
Here’s a simplified cost/capacity diagram:
The investment in a tape library and archiving software saves more money in the long run than it costs. Archiving (=migrating) from expensive production storage to inexpensive tape actually pays off – as is clearly visible. The more the storage capacity is expanded, the more money can be saved by using tape.
As an added bonus, tape can easily be stored off-site for maximum security. P5 offers tape cloning when at least two identical drives are available.
This specific type or Archive is relevant especially in media production (and in science, when capturing large amounts of data in a very short time span). When capturing in 4K or bigger resolution, the amount of data captured might outgrow the size of the available (high-performance) disk storage. Constantly expanding this disk storage doesn´t make sense – economically and from a workflow perspective. The original files are not needed for editing since this mostly happens with lower resolution proxy files. Only at the very end, at the conform stage, the original files are needed. Therefore, archiving at the very beginning of the workflow saves investment in storage and makes excellent use of LTO tape.
There are good reasons to build integrations to P5 Archive. Other software and hardware vendors like to avoid the effort of building archive software themselves. Too complex are the mechanisms and features and too complex is the landscape of storage products to be compatible with. This results in an ever-growing number of 3rd parties offering integration with P5 Archive. In the case of MAM systems, Archive and Restore is triggered from within the MAM interface. P5 Archive works in the background as storage backend.
There are scenarios where data has to be archived and held for a specified time only. This might be due to compliance regulations, to the duration of a project or the requirements of a client. For all those cases, P5 Archive offers a feature to specify retention time on the volumes in a storage pool. After this time limit is reached, volumes will be re-used and over-written if need be. Alternatively, all volumes can be labeled (=formatted) manually to erase all files on them and re-use them.
Even in enterprise environments, there often is a need for a local archive. It can be a department or a specific category of project or requirements that the enterprise archive cannot or should not fulfill. P5 Archive is ideally suited for such cases, as it requires no formal training and can start on a very small scale.
Special considerations and scenarios
For legal compliance, it might be necessary to store business and financial data in a form that cannot be modified. If LTO tape hardware is already in place for archiving, this can be easily achieved by using WORM LTO tapes. These are certified and can only be used to write once. While data may be read multiple times, it can not be changed or modified in any way and is certified for this protection. Since the volume of financial and business data is much lower than video and media, using some WORM cartridges every month or quarter is the most economic and easy-to-use solution for this specific requirement.
LTFS export for universal compatibility
In some workflows, large amounts of data have to be transported to partners or clients. LTO tape is better suited for this and more robust than hard drives. While LTFS is not yet a solid archive format, it is a valid solution for the mentioned data transport scenario. Various vendors offer free LTFS drivers that allow to mount LTFS tapes. If such a driver is in place, P5 Archive can mount and discover the tape, read from it or restore to it. For smaller amounts of data, RDX by Tandberg Data might be the even better and more flexible solution since the readers are inexpensive and can be connected to any Mac or PC.
Every technology has a timespan where it is particularly useful and makes sense. Disk storage generally has to be replaced or migrated more often than tape storage. Any tape archive that starts out with the current version of LTO (currently LTO-7) can be in use unchanged for 7 to 10 years. The two following generations of LTO (LTO-8 and LTO-9) will still be able to read the LTO-7 tapes written today. Additionally, there is a re-purchase guarantee of 10 years in place, so identical drives can be acquired many years later. It is wise to build a migration plan and put the necessary steps in the calendar even if they are still several years away.