RAID
RAID - Redundant Array of Inexpensive Disks.
Disk failures can be disastrous. RAID is a system that distributes or replicates data across multiple disks. RAID not only helps avoid data loss but also minimizes the downtime associated with hardware failures (often to zero) and potentially increases performance.
RAID can do two basic things:
- First, it can improve performance by “striping” data across multiple drives, thus allowing several drives to work simultaneously to supply or absorb a single data stream.
- Second it can replicate data across multiple drives, decreasing the risk associated with a single failed disk.
Replication Forms:
- Mirroring, in which data blocks are reproduced bit for bit on several different drives.
- Mirroring is faster but consumes more disk space.
- Parity schemes, in which one or more drives contain an error-correcting checksum of the blocks on the remaining data drives.
- Parity schemes are more disk space-efficient but have lower performance.
Implementation
RAID can be implemented by dedicated hardware that presents a group of hard disks to the OS as a single composite drive.
It can also be implemented simply by the operating system’s (software) reading or writing multiple disks according to the rules of RAID.
Software vs. Hardware RAID
The most significant bottleneck in a RAID implementation is the disks themselves. So you can’t assume that a hardware-based implementation of RAID will be faster than a software/OS based implementation.
Hardware RAID has been predominant in the past for two main reasons:
- Lack of software alternatives (no direct OS support for RAID).
- Hardware’s ability to buffer writes in some form of nonvolatile memory. This makes writes appear to complete instantaneously. Also protects against a potential corruption issue called the “RAID 5 write hole”.
Many of the common “RAID cards” sold for PCs have no non-volatile memory at all; they are really just glorified SATA interfaces with some RAID software onboard. (Non-volatile memory is a type of computer memory that has the capability to hold save data even if the power is turned off.)
- RAID implementations on PC mobos fall into this category. You’re really much better off using the RAID features in Linux/OpenSolaris.
- Kernel’s software to manage RAID environment, removing the possibility of a RAID controller failure. A RAID controller failure can cause all the data across all disks to be destroyed/corrupted.
- Hardware/controller raid more expensive. Its fast but requires more money.
- Software RAID doesn’t cost a thing, BUT if the OS goes bad, or you run out of RAM, resources, etc. EVERYTHING goes down with it.
RAID Levels
RAID is traditionally described in terms of “levels” that specify the exact details of the parallelism and redundancy implemented by an array. “Higher” levels does not necessarily mean better… The levels are simply different configurations for different use-cases and needs.
Linear mode, aka JBOD (Just a Bunch of disks) is not even a real RAID level. And yet, every RAID controller seems to implement it.
- JBOD concatenates the block addresses of multiple drives to create a single, larger virtual drive.
- It provides no data redundancy or performance benefit.
- JBOD functionality is best achieved through a logical volume manager (LVM) rather than a RAID controller.
Below are the standard RAID levels.
RAID 0
Strictly used to increase performance by striping data across disks.
- minimum of two or more drives of equal size, but instead of stacking them end-to-end, it stripes data alternately among the disks in the pool.
- Sequential reads and writes are therefore spread among several disks, decreasing write and access times.
- Note that RAID 0 has reliability characteristics that are significantly inferior to separate disks. A two drive array has roughly douubled the annual failure rate of a single drive, and so on.
- NO REDUNDANCY WHATSOEVER. If one drive fails all data is lost. (no mirror, no parity)
- Good for creating large logical hard drives.
- Faster performance than a single drive (as blocks are striped).
[PIC]
RAID 1
AKA mirroring.
- Minimum of two disks.
- Writes are duplicated to two or more drives simultaneously.
- Writes are slightly slower than they would be on a single drive.
- Read speeds are comparable to RAID 0 because reads can be farmed out among the several duplicate disk drives.
- Slower than RAID 0 but excellent redundancy. (as blocks are mirrored)
- If you have two 250GB drives in RAID 1 setup, your total storage is really only 250GB.
[pic]
RAID 1+0 (AKA RAID 10) & RAID 0+1
RAID 10, also known as RAID 1+0, is a RAID configuration that combines disk mirroring and disk striping to protect data.
- A minimum of four disks and stripes data across mirrored pairs.
- As long as one disk in each mirrored pair is functional, data can be retrieved. If two disks in the same mirrored pair fail, all data will be lost because there is no parity in the striped sets.
- Logically, they are concatenations of RAID 0 and RAID 1, but many controllers and software implementations provide direct support for them.
- The goal of both modes is to simultaneously obtain the performance of RAID 0 and the redundancy of RAID 1.
- Excellent redundancy ( as blocks are mirrored )
- Excellent performance ( as blocks are striped )
- If you can afford the dollar, this is the BEST option for any mission critical applications (especially databases).
RAID 10 = stripe of mirrors
RAID 0+1 = mirror of stripes
[pic]
RAID 5
RAID 5 stripes both data and distributed parity information, adding redundancy while simultaneously improving read performance.
- Minimum of 3 disks.
- If there are N drives in an array, N-1 of them can store data (because one is used for redundancy).
- So if you had three drives of 250 GB, you really only have 500 GB of storage instead of 750 GB because one out of the three drives is used for storing all the data/redundancy (N-1!!!)
- Good for creating large logical hard drives.
- Faster performance than single hard drive. (as blocks are striped)
- If one drive fails data is still usable. (distributed parity)
- Best cost effective option providing both performance and redundancy. Use this for DB that is heavily read oriented. Write operations will be slow.
[pic]
RAID 6
(Not so standard)
RAID 6 is similar to RAID 5, except it uses two parity disks.
- Meaning, a RAID 6 can fail up to two drives without losing data.
- Remember RAID 5, minimum of 3 disks.
RAID levels 2, 3, and 4 are defined but are rarely deployed. LVMs usually include both striping (RAID 0) and mirroring (RAID 1) features.
Size vs. Volume in RAID
The size of your RAID is bigger than the volume of it.
- The sizeof the array is the total amount of all space allocated for the hard drive array.
- Volume of the array is the amount of usable space. So if you had two 500 GB drives in a RAID 1 (mirroring) array, your total volume is 500 GB.
The size that can be used by any one hard drive, can only be as large as the smallest hard drive.
- So if you have three drives in a RAID 5 setup, 750 GB, 250 GB, 100 GB, you can only use 100 GB from both the 250 and 750 hard drives. Remember, information is stipped across, so you can stripe across 100 GB on all drives.
[pic]
- So if you had two 500 GB drives in a RAID 1 (mirrored) array, your total size of the array is 1 TB.
- In RAID 0 (striping), if you had four 500 GB drives, your total size AND volume will be 2 TB, because remember, in RAID 0 all drives must be the same, and data is striped across all drives (All the disks are used).
- In a RAID 5 (stripe and parity) set up, if you had four 500 GB hard drives, your total size of the array is 2 TB. But your total volume is minus one hard drive, because, remember one drive is used for parity/redundancy, so your total usable space/volume is 1.5 TB.
- In RAID 6 (two parity drives), if you had six 500 GB drives, your total size of the array is 3 TB. But your total volume/usable space within the array is 2 TB.
References: [+] https://www.thegeekstuff.com/2010/08/raid-levels-tutorial/