RAID & Data Security Primer

RAID and Data security have several things in common.  First, they both pertain to the storage, access and use of data.  This does not go over specifics of RAID levels, arrays, setup; or data value analysis, backups best practices, or regulation compliances.  This is intended only to go over the basic concepts of data security and RAID concepts, and how RAID can be beneficial or detrimental to data security and backups.

RAID is a technology to overcome one, or a combination of several limitations of storage.

__data security__ (mirrors, the closest thing to backups RAID can be considered)
__storage capacity__ (JBOD – Just a Bunch Of Disks, multi-drive single partitions combining the full capacity of each drive)
__I/O speeds__ (striping, allows multiple drives to be used for read and write operations, limited by bus speed, not drive speeds, has the happy coincidence of increasing partition capacity)
__data integrity__ (parity – block or bit parity, increases data integrity by saving and comparing hashes, allows recovery of data from a lost drive by rebuilding the data from the parity. This uses a good amount of space on each drive, but the total capacity is increased by a good percentage of each drive’s capacity)

RAID arrays can be used for archival and near-line backups. These drives are not continually used, and are intended for daily or weekly updates of the data, and accessed only when required. The data from these drives should be stored elsewhere for offline/offsite backups. Though the drives in a backup RAID array are accessed much less harshly, they are still vulnerable to failure.

Combining RAID levels increases costs (minimum disks requirements, minimum controller requirements, higher electrical usage), complexity, and two or more of the above advantages.

Utilizing RAID arrays as your sole backups solution is a credible strategy, if implemented solely as a backups solution (and not general data storage, SQL databases, etc)

Data value should determine the level of backups which are required. A RAID array may be the final backup, if the data’s value allows for it to completely disappear in the event of a failure, or chancing that backups can be recreated from working data before that data is corrupted or lost.

Data security and backups, on the other hand have their own aspects. Such as how valuable the data is, how quickly it needs to be accessed, how secure the data needs to be (from theft / unauthorized access), and data integrity from corruption.

Data can be summed up into these categories:
__Waste__ – data which isn’t wanted, needed, or otherwise usable, such as spam emails.
__Temporary__ – data which has current and immediate value, but becomes waste in short order, such as cache files, cookies, streaming videos/music, etc
__Associated__ – data which value is inherit upon some other data, program or situation, may or may not be temporary, such as game configs, save files, worksheets for projects
__Recreatable__ – data which can easily be re-created, but which is slightly time and/or resource consuming. This may include things such as music collections (where source is from an online store, or owned CDs), certain config files, memes, quick sketch work, etc
__Important__ – data which could be recreated, but at high time, resource or financial costs, where backups begin to be more than just beneficial, but required for uninterrupted work flow
__Necessary__ – Data which must never be lost or corrupted, as such would halt production and work until the data is restored or recreated. Such data would be system files, source code for important projects (Looking at you, CD Projekt Red)
__Production__ – more important than Necessary, can have severe time, resource and financial impact if lost or corrupted, at a considerably higher impact if there are no online/nearline backups to restore from, and almost impossible to recover from if no backups exist at all
__Customer__ – data which belongs to customers and clients, and which can cause operations to be shut down to liability and lawsuits
__Protected__ – Data which must retain a high level of security, integrity, value and availability. Gov/Mil/HIIPPA, etc

Complexity is on an exponential curve here.

Leave a Reply

Your email address will not be published. Required fields are marked *