Media companies today, more than ever, are looking for effective strategies to Archive their valuable content and assets which have been created or acquired. I'm constantly asked by customers, what are the current trends for archiving and what are other companies doing to manage their increasing storage requirements. Archiving, while not a new or emerging idea, has for as long as I can remember been in a state of flux particularly in the last 5 years with rapid technology breakthroughs.
In this blog, we'll look at the current state and emerging trends around Archiving in Media with a focus on the latest Object Storage Archiving trends.
Archiving in Media Today
Tape - LTO
Most Media Companies have or had Tape as a medium for Archiving. Tape offers trusted long term retention (>30 years) at the most cost effective price point per GB/TB. Today we see LTO 7 Tape drives and Tape Libraries offering greater than ever capacities, with LTO 7 offering 6TB of uncompressed capacity with up to 15TB when compression is effective. Tape offers some benefits over other Archiving options that are still required for commercial compliance, but are not offered with other Archiving solutions. One of these benefits is “vaulting" where multiple tapes can be written at once permitting offsite vault copies that could be kept for compliance and additional data security.
Tape is still the “standard” for Archiving in Media as its high-density low-cost entry point makes it ideal for small to medium sized companies and satisfies data protection and retention requirements for most Media Requirements.
So what's the problem with Tape?
Tape archiving, despite all the advancements in technology, still requires a level of effort and maintenance to ensure continuous error free operation. Tape Libraries need to be monitored and stocked with new blank media and often require a dedicated Archive user/manager overseeing tape requests. Low cost or under spec'd Tape Libraries often have fewer Tape Drives than required for continuous/interruption free Archiving. In this situation Drives that are in use until current tasks are complete prevents other operations, such as restoring, from commencing. It is possible, despite current LTO drives being able to read/write at close 300MB/s, to not complete Archiving tasks for days or even weeks because Tape Libraries perform tasks sequentially and are limited in concurrency by how many other drives are available.
Lower Cost Disk Tier
You may be forgiven for looking towards Tiered Storage utilising lower cost disk storage tiers with ever increasing hard disk capacities (8 – 10 TB per HD today) as something beyond Tape Libraries and Tape Media that needs to be managed. It's true that with high capacity disks that having a high density/high capacity disk tier is highly desirable, but it's often challenging to implement in existing environments. The high capacity tier itself is not the challenge but rather the Automated Tiering or Hierarchical Storage Management (HSM) and appropriate safe guards such as checksum verification and logging.
The workflow becomes complex and costly when no automation is present, and worse when failures in non-automated tiering occur. Vendors with SAN offerings often have HSM/Automated Tier options for storage which can be included or added to an existing SAN at additional costs. Tier 1 solutions that are not SAN based or that do not offer HSM face the challenge of finding suitable software applications that can manage data with automated policies and provide logging and data verification.
HSM aside, Lower Cost Disk Tiers still require a level of maintenance to ensure continuous data protection. Just like Tier 1 Storage, Lower Cost Disk Tiers use disk RAID Sets (RAID6 today) that have strict requirements for data integrity. Failed drives must be replaced/repaired as soon as possible to avoid catastrophic data corruption/loss. Silent data corruption, a topic on its own, is still a primary concern for long-term data retention on any disk-based Storage Tiers. Silent corruption is particularly worrying for Media companies that use external removable hard drives on shelves for Archiving and Backup. It is most likely to occur when a Hard Drive is not attached to a host system that could potentially rectify any corruptions with checksums and re-mapping, and the corruption is not often realised until months/years later when the Hard Drive and its contents need to be accessed.
Object Storage – (Cloud Public or Private)
Everyone is talking Cloud yet many of us still don't have a complete understanding of what that means with respect to Archiving. How do we use the Cloud to Archive, what are challenges if any with Cloud and how does it really work?
We'll begin by looking at what Cloud or Object Storage is and how it compares to other disk and filesystem based storage (Block Storage) solutions.
Object Storage came about from a need to manage very large unstructured data sets. Think of the internet and all the bits of information you can find such as text, images, videos, music and how many billions of files need to be stored.
Traditional Block storage with Filesystems, i.e. Primary Tier 1 (SAN or NAS Disk Storage, DAS Disk Storage) all have limitations with respect to scalability. Block storage and Filesystems address individual blocks of data that spread across equal sized disk blocks for any given file. The larger the file(s) the more data blocks need to be used and addressed. The data blocks' addressess and the files that they are associated to are kept in records referred to as Meta Data or File Allocation Tables (FAT). These Meta Data records become increasingly larger and more complex and are effectively limited by their design and implementation i.e. limited by RAM to hold data block addresses or by metadata database size. The result is a finite limit to how many files you could address, and overall how far you can scale
Since the FAT or Meta Data for the filesystem is typically kept with the Disk based Storage it limits the sharing capabilities to a local area network or to a local host machine. In addition, with Meta Data being kept in a local location it must be protected and free from corruption at all times which poses a catastrophic point of failure to the storage system's data integrity.
Object Storage looks to remove the limitations present in traditional filesystem block storage solutions by not using a FAT or Meta Data to address data blocks. Object Storage looks to simplify storage scalability and accessibility using a layer in front of block storage to manage collections of objects rather the data blocks themselves thus deprecating the need for FATs/Meta Data.
Read and write access to the Objects is performed through one of many protocols or Application Programming Interfaces (APIs) offered by Vendors. Common protocols are REST, S3, SOAP and a range of Vendor specific APIs.
Typical protocols such as REST are designed to provide a simple set of functions. The 3 main functions or commands are GET, PUT and DEL. An application simply needs to PUT an object (data/file) into an Object Store and it will be returned a Global UID or Identifier that will need to be used to retrieve (GET) or delete (DEL).
The GUID needs to be stored similarly to the way a FAT or Meta Data is stored for Data Blocks but it can be designed from the ground up for massive scale and protection and can also can maintained per application or tenant as opposed to the entire Filesystem.
Since GUIDs are managed by the Applications rather than a central Filesystem repository/table, the Applications are free to implement additional features such as geo-spreading and sharing, additional data protection, multi-application and multi-tenancy. It's worth noting many Object Storage Vendors provide such functionality as “standard” or as their value-add.
Object Storage offers another important benefit over traditional Block Storage solutions through enhanced durability and data protection. Unlike traditional Tier 1 and 2 Disk Solutions, Object Storage solutions don't rely on understanding how the file is stored to the disk blocks but rather relies on where the data is stored, say on what disk or geo-location. This requires a different type of data protection. In an overly simplified explanation the data needs to be replicated - or at least parts of it replicated - over several locations locally or geo-spread. The object is spread across disks or geo-located disks to allow for a form a forward error correction known as Erasure Coding to rebuild the object from any number of locations. This permits whole disk failures without requiriing healing in the traditional sense in the way that RAID Disk sets do. The Data is effectively available across several locations, and losing a disk would only mean losing capacity. Adding or expanding the capacity is a matter of adding more storage locations such as more disks or geo-located sets of disks to the Object Storage. This is often described as self-healing, where minimal or no maintenance is required to a point.
Examples of Archiving Object Storage Archive
Quantum's Lattus M with Storage Manager
For further information on Quantum's Object Storage solutions :-
Lattus Scale-Out Architecture
C5 and C10 Controller Nodes
- Encodes data into objects and disperses them to the storage nodes
- Provides data access via HTTP REST S3)
- 1U chassis
S20 and S30 Storage Nodes
- Extensible data/content storage
- Low power & cooling requirements
- 12 drives per node, with 48TB (S20) or 72TB (S30) raw capacity per node
- Self-healing: checks for data integrity and repairs bit errors
- High density: Up to 72TB in 1U chassis A10
NAS Access Mode
CIFS/SMB and NFS filesystem access
- 400 million files per A10 Access Node
- In-memory and on-disk data caching for improved performance
- 2U chassis
Object Matrix with MatrixStore
- Non-proprietary rack mounted storage servers coupled with trusted MatrixStore object storage
- Disaster Recovery and Business Continuity built in
- Scalable performance (10 GigE or 1 GigE connect nodes)
- Open API for integration with 3rd party applications
- Integrates with Avid Interplay, Grass Valley STRATUS, Vidispine, GLOOKAST, Signiant, Aspera, EditShare, Cantemo & many others
- Future proof. Migrates data to new technology platforms
- HSM functionality pushing content to LTO & Optical media
- Access via SMB, FTP, Local Drive, S3 Connect, DropSpot and many more
Object Storage and Archiving software
Many Archive and Backup Vendors such as Archiware and Atempo have support for Object Storage with S3 or REST interfaces. It is possible to use an S3 Connector software such as WingFS or TNTDrive to interface to most Object Storage giving NAS like connectivity into the Object Store and thus allowing just about any Archive/Backup software to be used.
To find out more about how your facility can shift from tape archiving to Object Storage contact Digistor to organise a consultation.
Patrick Trivuncevic Digistor, Solutions Architect
As Solutions Architect and Senior Systems Engineer, Patrick is instrumental in delivering pre- and post-sales services including initial consulting and scope of works, roll-out, commissioning, training and then on-going support services to ensure customers are productive and can maximise their creativity, unhindered by technical issues or limitations. In a career spanning eighteen years, Patrick has excelled in pre- and post-sales support of specialised technology solutions in High Performance Computing, Research and Education, Cloud, Content and Media.