Ninety percent of the world’s data was generated between 2011 and 2013 and 2.5 quintillion data bytes are created daily thanks to developments in driverless cars, big data, IoT, and compute power being pushed to the edge. One of the most challenging aspects is data storage. According to IDC, the hyperscale environments will drive the global object storage market to a projected growth (CAGR) of 30 percent by 2020 with revenue reaching 19.8 billion (USD).
Compute Power Being Pushed to the Edge
It seems that data storage, in general, continues to be a concern across industries. According to the results from an independent market research conducted by Suse/Loudhouse which included surveys from over 1,000 senior IT leaders in over 11 countries and industries, the three major challenges which over 70 percent of global companies worry include: high costs; performance; and growth complexity and fragmentation (https://www.suse.com/newsroom/post/2017/study-70-percent-of-companies-worldwide-increasingly-worried-about-cost-performance-and-complexity-of-traditional-data-storage/). Today, most organizations can identify with any of the four types of data storage disk based, flash, hybrid (disk and flash), and software defined storage (SDS).
There are two types of traditional data storage: file and block.
- File-based storage organizes pieces of data in a hierarchical architecture similarly to a file cabinet containing paper files or a computer filing structure with folders (e.g., NAS).
- Block-based storage refers to separate blocks of data which are stored using unique addressing and which the system reassembles the blocks when queried (e.g., SAN).
Both of these storage types receive support from in-place updates, fixed system attributes regarding metadata support, high performance strengths, simplified access and management, and both are difficult to extend beyond the data center.
There is a newer type of storage typically referred to as object attached storage. According to Garner, object storage refers to system components including hardware and software that storage data “objects” using custom metadata. Accessing dynamically addressable object data that is best suited for cloud storage is supported using REST and SOAP over HTTP or APIs and is scalable and easily distributed. Object data is well suited for analytics and in particular, unstructured data growth and complexities of storage. IBM has recently announced the new on-premise compliant-enabled cloud object storage system and they offer their IBM Redbook publication concerning Cloud Object Storage as a Service.
The real issue with storage systems is that they are not manageable and not easily moveable as in archiving but storage is not the real problem – it lies with metadata and the file system. Data security is another concern and many vendors offer different schemas but one which is gaining greater acceptance is All-Or-Nothing Transform (AONT) encryption – object encryption using per-object generated keys to prevent disclosure if any node or hard drive becomes compromised. Other types of solutions include containers builds on kubernetes and elastic storage technology. However, for large data files, many organizations use Erasure Coding technology which divides objects into pieces and then calculates multiple parities.
Security becomes a paramount concern for compliance industries – financial, banking, health, and long-term historical archival, to name a few. When data must be non-rewriteable and non-erasable, the US Securities and Exchange Commission (SEC) makes it very clear and may include fines, revocation, and suspension of licenses for some organizations - the 17 C.F.R. § 240.17a-4(f), which regulates exchange members, brokers or dealers. It specifies, “...that electronic records are capable of being accurately reproduced for later reference by maintaining the records in an unalterable form.” For example, a broker/dealer must ensure that records are not deleted during periods when the regulatory retention period has lapsed but the system must allow records to be retained beyond the retentions periods specified in Commission rules.
- Using the storage data for calculating the retention date for time-based retention periods
- Event-based retention periods with indefinite retention time are calculated from the date of event-trigger date to calculate a final retention expiration date
The following sections apply to specifics cases regarding data retention and when combined, these two rules require the preservation of records in an easily accessible manner.
- Rule 17a-3 covers when documents must be retained and for how long
- Rule 17a-4 contains regulations for how these documents must be retained
Additional financial regulations include the Financial Industry Regulatory Authority (FINRA) Financial Industry Regulatory Authority (FINRA) Rule 4511(c) which explicitly defers to the format and media requirements of SEC Rule 17a-4, for the books and records it requires.
Electronic protected health information (ePHI) and HIPAA (Health Insurance Portability and Accountability Act of 1996) specify that each state has its own requirements regarding covered entities that must retain ePHI for a certain period of time including when the entity closes its doors.
The U.S. Department of Health and Human Services (HHS) states, “The HIPAA Privacy Rule does not include medical record retention requirements. Rather, State laws generally govern how long medical records are to be retained.” However, HIPAA requires that covered entities apply administrative, technical, and physical safeguards to protect the privacy of medical records including protected health information (PHI).