All about RAID arrays from hard drives (HDD). RAID array

25.09.2019 Windows phone

Almost everyone knows the proverb "Until the thunder breaks out, the peasant does not cross himself." It is vital: until this or that problem touches the user closely, he will not even think about it. The power supply died and took a couple of devices with him - the user rushes to look for articles on relevant topics about tasty and healthy food. The processor burned out or began to fail due to overheating - in the "Favorites" there are a couple of links to branching forum threads discussing CPU cooling.

With hard drives, the same story: as soon as another screw leaves our mortal world, cracking goodbye with its heads, the PC owner starts fussing to ensure the improvement of the drive's living conditions. But even the most sophisticated cooler cannot guarantee a long and happy life for a disk. Many factors affect the life of the drive: a manufacturing defect, an accidental kick of the body with a foot (especially if the body is somewhere on the floor), and dust that has passed through the filters, and high-voltage noise sent by the power supply ... There is only one way out - backup information, and if you need a backup on the go, then it's time to build a RAID array, since today almost every motherboard has some kind of RAID controller.

At this point, we will stop and make a brief digression into the history and theory of RAID arrays. The abbreviation RAID itself stands for Redundant Array of Independent Disks (Redundant Array of Independent Disks). Previously, instead of independent, they used inexpensive (inexpensive), but over time, this definition has lost its relevance: almost all disk drives have become inexpensive.

The history of RAID began in 1987, when the article "A Chassis for Redundant Arrays of Cheap Disks (RAID)" was published, signed by comrades Peterson, Gibson and Katz. The note described the technology of combining several ordinary disks into an array to get a faster and more reliable drive. The authors of the material also told readers about several types of arrays - from RAID-1 to RAID-5. Subsequently, a zero-level RAID array was added to the arrays described almost twenty years ago, and it gained popularity. So what are all these RAID-x? What is their essence? Why are they called redundant? This is what we will try to figure out.

In very simple terms, RAID is such a thing that allows the operating system not to know how many disks are installed in the computer. Combining hard drives into a RAID array is a process that is exactly the opposite of splitting a single space into logical disks: we form one logical drive based on several physical ones. In order to do this, we need either the appropriate software (we will not even talk about this option - this is an unnecessary thing), or a RAID controller built into the motherboard, or a separate one inserted into a PCI or PCI Express slot. It is the controller that combines the disks into an array, and the operating system no longer works with the HDD, but with the controller, which does not tell it anything unnecessary. But there are a great many options for combining several disks into one, more precisely, about ten.

What are RAIDs?

The simplest of them is JBOD (Just a Bunch of Disks). Two hard drives are glued into one in series, information is written first to one and then to another disk without breaking it into pieces and blocks. From two drives of 200 GB each, we make one of 400 GB, which works almost at the same, but in reality at a slightly lower speed, as each of the two drives.

JBOD is a special case of a zero level array, RAID-0. There is also another version of the name of arrays of this level - stripe (stripe), the full name is Striped Disk Array without Fault Tolerance. This option also involves combining n disks into one with a volume increased by n times, but the disks are not connected sequentially, but in parallel, and information is written to them in blocks (the block size is set by the user when forming a RAID array).

That is, if the sequence of numbers 123456 needs to be written to two drives included in the RAID-0 array, the controller will divide this chain into two parts - 123 and 456 - and write the first to one disk, and the second to another. Each disk can transfer data... well, let's say at a speed of 50 MB / s, and the total speed of two disks, data from which is taken in parallel, is 100 MB / s. Thus, the speed of working with data should increase by n times (in reality, of course, the increase in speed is less, since no one canceled the losses for searching for data and transferring them over the bus). But this increase is given for a reason: if at least one disk fails, information from the entire array is lost.

Level 0 RAID. The data is divided into blocks and scattered across disks. There is no parity or redundancy.

That is, there is no redundancy and no redundancy at all. Considering this array as a RAID array can only be conditional, however, it is very popular. Few people think about reliability, because you can’t measure it with benchmarks, but everyone understands the language of megabytes per second. It's not bad or good, it just exists. Below we will talk about how to eat fish and maintain reliability. Recovery of RAID-0 after failure

By the way, an additional minus of the stripe array is its intolerance. I do not mean that he does not tolerate certain types of food or, for example, the owners. He does not care about this, but moving the array itself somewhere is a whole problem. Even if you drag both disks and controller drivers to a friend, it’s not a fact that they will be defined as one array and you can use the data. Moreover, there are cases when a simple connection (without writing anything!) of stripe disks to a "non-native" (different from the one on which the array was formed) led to data corruption in the array. We don’t know how relevant this problem is now, with the advent of modern controllers, but we still advise you to be careful.

Level 1 RAID array with four drives. The disks are divided into pairs, the drives within the pair store the same data.

The first truly "redundant" array (and the first RAID to come into existence) is RAID-1. Its second name - mirror (mirror) - explains the principle of operation: all disks allocated for the array are divided into pairs, and information is read and written to both disks at once. It turns out that each of the disks in the array has an exact copy. In such a system, not only the reliability of data storage increases, but also the speed of their reading (you can read from two hard drives at once), although the write speed remains the same as that of a single drive.

As you might guess, the volume of such an array will be equal to half the sum of the volumes of all hard drives included in it. The downside of this solution is that you need twice as many hard drives. But on the other hand, the reliability of this array is actually not even equal to double the reliability of a single disk, but much higher than this value. The failure of two hard drives within ... well, let's say, a day is unlikely, if, for example, the power supply did not intervene in the matter. At the same time, any sane person, seeing that one disk in a pair is out of order, will immediately replace it, and even if the second disk gives up immediately after that, the information will not go anywhere.

As you can see, both RAID-0 and RAID-1 have their drawbacks. And how would you get rid of them? If you have at least four hard drives, you can create a RAID 0+1 configuration. To do this, RAID-1 arrays are combined into a RAID-0 array. Or vice versa, sometimes they create a RAID-1 array from several RAID-0 arrays (the output will be RAID-10, the only advantage of which is less data recovery time when one disk fails).

The reliability of such a configuration of four hard drives is equal to the reliability of a RAID-1 array, and the speed is actually the same as that of RAID-0 (in reality, it will most likely be slightly lower due to the limited capabilities of the controller). At the same time, the simultaneous failure of two disks does not always mean a complete loss of information: this will happen only if the disks containing the same data break, which is unlikely. That is, if four disks are divided into pairs 1-2 and 3-4 and the pairs are combined into a RAID-0 array, then only the simultaneous failure of disks 1 and 2 or 3 and 4 will lead to data loss, while in the event of the untimely death of the first and the third, second and fourth, first and fourth or second and third hard drives, the data will remain safe and sound.

However, the main disadvantage of RAID-10 is the high cost of disks. Still, the price of four (minimum!) hard drives cannot be called small, especially if the volume of only two of them is actually available to us (as we have already said, few people think about reliability and how much it costs). Large (100%) redundancy of data storage makes itself felt. All this has led to the fact that recently a variant of the array called RAID-5 has gained popularity. Three disks are required for its implementation. In addition to the information itself, the controller also stores parity blocks on the array drives.

We will not go into the details of the parity check algorithm, we will only say that in case of loss of information on one of the disks, it can be restored using parity data and live data from other disks. The parity block has the volume of one physical disk and is evenly distributed over all the hard drives of the system so that the loss of any disk allows you to recover information from it using the parity block located on another disk in the array. Information is divided into large blocks and written to disks one by one, that is, according to the 12-34-56 principle in the case of a three-disk array.

Accordingly, the total volume of such an array is the volume of all disks minus the capacity of one of them. Data recovery, of course, does not happen instantly, but such a system has high performance and a margin of safety at a minimal cost (a 1000 GB array requires six 200 GB disks). However, the performance of such an array will still be lower than the speed of a stripe system: with each write operation, the controller also needs to update the parity index.

RAID-0, RAID-1 and RAID 0 + 1, sometimes even RAID-5 - these levels most often exhaust the capabilities of desktop RAID controllers. Higher levels are available only to complex systems based on SCSI hard drives. However, happy owners of SATA controllers with support for Matrix RAID (such controllers are built into Intel's ICH6R and ICH7R southbridges) can take advantage of RAID-0 and RAID-1 arrays with only two drives, and those who have a board with ICH7R , can combine RAID-5 and RAID-0 if they have four identical drives.

How is this implemented in practice? Let's analyze a simpler case with RAID-0 and RAID-1. Let's say you bought two 400 GB hard drives. You partition each drive into 100 GB and 300 GB logical drives. After that, using the BIOS-hardened Intel Application Accelerator RAID Option ROM utility, you combine 100 GB partitions into a stripe array (RAID-0), and 300 GB partitions into a Mirror array (RAID-1). Now, on a fast 200 GB disk, you can add, say, toys, video material and other data that require a high speed of the disk subsystem and, moreover, are not very important (that is, those that you will not regret losing very much), and on a mirrored 300- gigabyte disk you move working documents, mail archive, service software and other vital files. When one disk fails, you lose what was placed on the stripe array, but the data you placed on the second logical disk is duplicated on the remaining drive.

Combining the RAID-5 and RAID-0 levels implies that part of the volume of four disks is reserved for a fast stripe array, and the other part (let it be 300 GB on each disk) is for data blocks and parity blocks, that is, you get one a super-fast 400 GB disk (4 x 100 GB) and one reliable but slower 900 GB 4 x 300 GB array minus 300 GB for parity.

As you can see, this technology is extremely promising, and it would be nice if other chipset and controller manufacturers support it. It is very tempting to have arrays of different levels on two disks, fast and reliable.

Here, perhaps, are all types of RAID arrays that are used in home systems. However, in life you may come across RAID-2, 3, 4, 6 and 7. So let's still see what these levels are.

RAID-2. In an array of this type, the disks are divided into two groups - for data and for error correction codes, and if the data is stored on n disks, then n-1 disks are needed to store the correction codes. Data is written to the corresponding hard drives in the same way as in RAID-0, they are divided into small blocks according to the number of disks intended for storing information. The remaining disks store error correction codes, according to which, in the event of a hard drive failure, information can be restored. The Hamming method has long been used in ECC memory and allows you to correct small one-bit errors on the fly if they suddenly occur, and if two bits are erroneously transmitted, this will be detected again using parity check systems. However, for the sake of this, no one wanted to keep a bulky structure of almost double the number of disks, and this type of array did not become widespread.

Array structure RAID-3 is as follows: in an array of n disks, data is divided into blocks of 1 byte and distributed over n-1 disks, and another disk is used to store parity blocks. In RAID-2, there were n-1 disks for this purpose, but most of the information on these disks was used only for error correction on the fly, and for simple recovery in the event of a disk failure, a smaller number of it is enough, even one dedicated hard drive is enough.

RAID level 3 with separate parity drive. There is no backup, but the data can be restored.

Accordingly, the differences between RAID-3 and RAID-2 are obvious: the impossibility of error correction on the fly and less redundancy. The advantages are as follows: the speed of reading and writing data is high, and very few disks are required to create an array, only three. But an array of this type is only good for single-tasking work with large files, as there are speed problems with frequent requests for small data.

An array of the fifth level differs from RAID-3 in that the parity blocks are evenly distributed across all disks in the array.

RAID-4 similar to RAID-3, but differs from it in that data is broken into blocks instead of bytes. Thus, it was possible to "defeat" the problem of low data transfer rate of small volume. Writes are slow due to the fact that parity for a block is generated during writes and written to a single disk. Arrays of this type are used very rarely.

RAID-6- this is the same RAID-5, but now two parity blocks are stored on each of the disks in the array. Thus, if two disks fail, the information can still be recovered. Of course, the increase in reliability has led to a decrease in the useful volume of disks and to an increase in their minimum number: now, if there are n disks in the array, the total amount available for writing data will be equal to the volume of one disk multiplied by n-2. The need to calculate two checksums at once determines the second drawback inherited by RAID-6 from RAID-5 - low data write speed.

RAID-7 is a registered trademark of Storage Computer Corporation. The array structure is as follows: data is stored on n-1 disks, one disk is used to store parity blocks. But a few important details have been added to eliminate the main drawback of arrays of this type: a data cache and a fast controller that handles requests. This made it possible to reduce the number of disk accesses to calculate the data checksum. As a result, it was possible to significantly increase the speed of data processing (in some places by five or more times).

A RAID 0+1 level array, or a construction of two RAID-1 arrays combined into a RAID-0. Reliable, fast, expensive.

New disadvantages have also been added: the very high cost of implementing such an array, the complexity of its maintenance, the need for an uninterruptible power supply to prevent data loss in the cache memory during power outages. You are unlikely to meet an array of this type, and if you suddenly see it somewhere, write to us, we will also look at it with pleasure.

Creating an array

I hope you have already coped with the choice of the array type. If your board has a RAID controller, you will not need anything other than the required number of disks and drivers for this very controller. By the way, keep in mind: it makes sense to combine into arrays only disks of the same size, and it is better to have one model. The controller may refuse to work with disks of different sizes, and, most likely, you will be able to use only a part of a large disk that is equal in volume to the smaller of the disks. Also, even the speed of a stripe array will be determined by the speed of the slowest disk. And my advice to you: do not try to make the RAID array bootable. It is possible, but in case of any failures in the system, it will not be easy for you, since the restoration of working capacity will be very difficult. In addition, it is dangerous to place several systems on such an array: almost all programs responsible for choosing an OS kill information from the service areas of the hard drive and, accordingly, corrupt the array. It is better to choose a different scheme: one disk is bootable, and the rest are combined into an array.

Matrix RAID in action. Part of the disk space is used by the RAID-0 array, the rest of the space is taken by the RAID-1 array.

Every RAID array starts with the BIOS of the RAID controller. Sometimes (only in the case of integrated controllers, and even then not always) it is built into the main BIOS of the motherboard, sometimes it is located separately and activated after passing the self-test, but in any case, you need to go there. It is in the BIOS that the necessary array parameters are set, as well as the sizes of data blocks, the hard drives used, and so on. After you determine all this, it will be enough to save the settings, exit the BIOS and return to the operating system.

There you definitely need to install the controller drivers (as a rule, a floppy disk with them is attached to the motherboard or to the controller itself, but they can be written to a disk with other drivers and utility software), reboot, and that's it, the array is ready to go. You can split it into logical disks, format and fill it with data. Just remember that RAID is not a panacea. It will save you from data loss when the hard drive dies and minimize the consequences of such an outcome, but it will not save you from power surges in the network and failures of a low-quality power supply that kills both drives at once, without regard to their "massiveness".

Disregard for high-quality power supply and temperature conditions of disks can significantly reduce the life of the HDD, it happens that all disks in the array fail, and all data is irretrievably lost. In particular, modern hard drives (especially IBM and Hitachi) are very sensitive to the +12 V channel and do not like even the slightest change in voltage on it, so before purchasing all the equipment necessary to build an array, you should check the appropriate voltages and, if necessary, turn on a new one. BP to the shopping list.

Powering hard drives, as well as all other components, from the second power supply, at first glance, is implemented simply, but there are a lot of pitfalls in such a power scheme, and you need to think a hundred times before deciding to take such a step. With cooling, everything is simpler: you just need to ensure that all hard drives are blown, plus do not place them close to each other. Simple rules, but, unfortunately, not everyone follows them. And it's not uncommon for both disks in an array to die at the same time.

In addition, RAID does not replace the need for regular data backups. Mirroring is mirroring, but if you accidentally corrupt or erase files, a second disk won't help you at all. So make a backup whenever you can. This rule applies regardless of the presence of RAID arrays inside the PC.

So are you RAIDy? Yes? Fine! Only in the pursuit of volume and speed, do not forget another proverb: "Make a fool pray to God, he will hurt his forehead." Strong disks and reliable controllers to you!

Noisy RAID Cost Benefits

RAID is good even without regard to money. But let's calculate the price of the simplest 400 GB stripe array. Two Seagate Barracuda SATA 7200.8 drives, 200 GB each, will set you back about $230. RAID controllers are built into most motherboards, meaning we get them for free.

At the same time, a 400 GB drive of the same model costs $280. The difference is $50, and with this money you can buy a powerful power supply, which you will undoubtedly need. I'm not talking about the fact that the performance of a composite "disk" at a lower price will be almost twice as high as the performance of a single hard drive.

Let's now calculate, focusing on a total amount of 250 GB. There are no 125 GB hard drives, so let's take two 120 GB hard drives. The price of each disk is $90, the price of one 250 GB hard drive is $130. Well, with such volumes, you have to pay for performance. And if you take a 300-gigabyte array? Two 160 GB disks - about $200, one 300 GB disk - $170... Again, not that. It turns out that RAID is beneficial only when using very large disks.

Today we will learn interesting information about what a RAID array is and what role these arrays play in the life of hard drives, yes, yes, in them.

The hard drives themselves play a rather important role in the computer, since, with the help of them, we start the system and store a lot of information on them.

Time passes and any hard drive can fail, it can be any that we are not talking about today.

I hope that many have heard of the so-called raid arrays, which allow not only to speed up the work of hard drives, but also, in which case, save important data from disappearing, perhaps forever.

Also, these arrays have ordinal numbers, which is how they differ. Each performs a different function. For example, there is RAID 0, 1, 2, 3, 4, 5 etc. These are the same arrays we will talk about today, and then I will write an article on how to use some of them.

What is a RAID array?

RAID- this is a technology that allows you to combine several devices, namely, hard drives, in our case there is something like a bunch of them. Thus, we increase the reliability of data storage and read / write speed. Perhaps one of these functions.

So, if you want to either speed up your disk or just secure your information, it's up to you. More precisely, it depends on the choice of the desired Raid configuration, these configurations are marked with serial numbers 1, 2, 3 ...

Raids are a very useful feature and I recommend it to everyone. For example, if you use 0th configuration, then you will experience an increase in the speed of the hard drive, after all, the hard drive is almost the lowest speed device.

If you ask why, then here, I think, everything is clear. every year they become more powerful, they are equipped with a higher frequency, a larger number of cores, and much more. The same with and . And hard drives are growing so far only in volume, and the turnover rate has remained the same as it was 7200. Of course, there are also rarer models. So far, the situation is being saved by the so-called, which speed up the system several times.

Let's say you went to build RAID 1, in this case you will receive a high guarantee of protecting your data, since they will be duplicated on another device (disk) and, if one hard drive fails, all information will remain on the other.

As you can see from the examples, raids are very important and useful, they should be used.

So, a RAID array is physically a bundle of two hard drives connected to the motherboard, you can have three or four. By the way, it should also support the creation of RAID arrays. Connecting hard drives is carried out according to the standard, and the creation of raids takes place at the software level.

When we created the raid programmatically, nothing much changed by eye, you just work in the BIOS, and everything else will remain as it was, that is, looking into My Computer, you will see all the same connected drives.

It doesn't take much to create an array: a motherboard with RAID support, two identical hard drives ( it is important). They should be the same not only in volume, but also in terms of cache, interface, etc. It is desirable that the manufacturer be the same. Now we turn on the computer and, there we look for the parameter SATA Configuration and put on RAID. After restarting the computer, a window should appear in which we will see information about disks and raids. There we must press CTRL+I to start setting up the raid, that is, adding or removing disks from it. Then the setup will begin.

How many of these raids are there? There are several of them, namely RAID 1, RAID 2, RAID 3, RAID 4, RAID 5, RAID 6. In more detail, I will talk about only two of them.

RAID 0- allows you to create a disk array in order to increase the speed of reading / writing.
RAID 1– allows you to create mirrored disk arrays for data protection.

RAID 0, what is it?

array RAID 0, which is also called stripping uses 2 to 4 hard drives, rarely more. Working together, they increase productivity. Thus, the data with such an array is divided into data blocks, and then written to several disks at once.

Performance increases due to the fact that one block of data is written to one disk, to another disk, another block, etc. I think it is clear that 4 disks will increase performance more than two. If we talk about security, then it suffers on the entire array. If one of the disks fails, then in most cases, all information will be lost forever.

The fact is that in a RAID 0 array, information is located on all disks, that is, the bytes of a file are located on several disks. Therefore, if one disk fails, a certain amount of data will also be lost, and recovery is impossible.

From this it follows that it is necessary to make permanent on external media.

RAID 1, what is it?

array RAID 1, it is also called Mirroring- mirror. If we talk about the disadvantage, then in RAID 1 the volume of one of the hard drives is somehow "inaccessible" to you, because it is used to duplicate the first drive. In RAID 0 this space is available.

The benefits, as you might have guessed, are that the array provides high data reliability, meaning that if one drive fails, all the data remains on the other. Failure of two disks at once is unlikely. Such an array is often used on servers, but this does not prevent it from being used on ordinary computers.

If you choose RAID 1, then be aware that performance will drop, but if data is important to you, then use this approach.

RAID 2-6, what is it?

Now I will briefly describe the remaining arrays, so to speak, for general development, and all because they are not as popular as the first two.

RAID 2- needed for arrays that use the Hamming code (I was not interested in what kind of code). The principle of operation is approximately the same as in RAID 0, that is, information is also divided into blocks and written one by one to disks. The remaining disks are used to store error correction codes, with the help of which, in the event of a failure of one of the disks, data can be restored.

True, for this array it is better to use 4 disks, which is quite expensive, and as it turned out, when using so many disks, the performance gain is rather controversial.

RAID 3, 4, 5, 6- I will not write about these arrays here, since the necessary information is already on Wikipedia, if you want to learn about these arrays, then read.

Which RAID array to choose?

Let's say that you often install various programs, games and copy a lot of music or movies, then you are recommended to use RAID 0. When choosing hard drives, be careful, they must be very reliable so as not to lose information. Be sure to back up your data.

Have important information that needs to be safe and sound? Then RAID 1 comes to the rescue. When choosing hard drives, their characteristics should also be identical.

Output

So we sorted out for someone new, and for someone old information on RAID arrays. I hope that the information will be useful for you. Soon I will write about how to create these arrays.

RAID- an abbreviation that stands for Redundant Array of Independent Disks - “a fault-tolerant array of independent disks” (previously, the word Inexpensive was sometimes used instead of Independent). The concept of a structure consisting of several disks grouped together to provide fault tolerance was born in 1987 in the seminal work of Patterson, Gibson, and Katz.

Native RAID Types

RAID-0
If we think that RAID is "fault tolerance" (Redundant ...), then RAID-0 is "zero fault tolerance", its absence. The RAID-0 structure is a "striped disk array". Data blocks are written one by one to all disks included in the array, in order. This improves performance, ideally by as much as the number of disks in the array, since writes are parallelized across multiple devices.
However, reliability is reduced by the same factor, since data will be lost if any of the drives in the array fails.

RAID-1
This is the so-called "mirror". Write operations are performed on two disks in parallel. The reliability of such an array is higher than that of a single drive, but the performance increase is insignificant (or not at all).

RAID-10
An attempt to combine the advantages of the two types of RAID and deprive them of their inherent disadvantages. If we take a RAID-0 group with increased performance, and give each of them (or the entire array) “mirror” disks to protect data from loss due to failure, we will get a fault-tolerant array with increased performance as a result of using striping.
It is one of the most popular RAID types in the wild today.
Cons - we pay for all the above advantages with half the total capacity of the disks included in the array.

RAID-2
Remained entirely theoretical. This is an array in which data is encoded with an error-correcting Hamming code, which makes it possible to recover individual faulty fragments due to its redundancy. By the way, various modifications of the Hamming code, as well as its successors, are used in the process of reading data from the magnetic heads of hard drives and optical CD / DVD readers.

RAID 3 and 4
“Creative development” of the idea of data protection by redundant code. The Hamming code is indispensable in the case of a “constantly unreliable” stream saturated with continuous, poorly predictable errors, such as, for example, a noisy terrestrial communication channel. However, in the case of hard disks, the main problem is not read errors (we assume that the data is given out by hard disks in the form in which we wrote them, if it works), but the failure of the entire disk.
For such conditions, you can combine a striped scheme (RAID-0) and, to protect against failure of one of the disks, supplement the recorded information with redundancy, which will allow you to recover data if some part of it is lost, allocating an additional disk for this.
If any of the data disks is lost, we can recover the data stored on it by simple mathematical operations on redundancy data; in the event of a disk failure with redundancy data, we still have data read from a RAID-0 disk array.
Variants of RAID-3 and RAID-4 differ in that in the first case, individual bytes are interleaved, and in the second - groups of bytes, “blocks”.
The main disadvantage of these two schemes is the extremely low speed of writing to the array, since each write operation causes an update of the “checksum”, a redundancy block for the written information. Obviously, despite the striped structure, the performance of a RAID-3 and RAID-4 array is limited by the performance of a single disk, the one on which the “redundancy block” lies.

RAID-5
An attempt to circumvent this limitation gave rise to the next type of RAID, which is currently the most widely used, along with RAID-10. If writing a “redundancy block” to disk limits the entire array, let's also spread it across the disks of the array, make an unallocated disk for this information, thereby redundancy update operations will be distributed across all disks of the array. That is, as in the case of RAID-3 (4), we take disks to store N information in the amount of N + 1 disk, but unlike Type 3 and 4, this disk is also used to store data mixed with redundancy data, like the rest N.
Disadvantages? And what about without them. The problem with slow recording was partly solved, but still not completely. However, writing to a RAID-5 array is slower than writing to a RAID-10 array. But RAID-5 is more “cost effective”. For RAID-10, we pay exactly half of the disks for fault tolerance, and in the case of RAID-5, this is just one disk.

However, the write speed decreases in proportion to the increase in the number of disks in the array (unlike RAID-0, where it only grows). This is due to the fact that when writing a data block, the array needs to recalculate the redundancy block, for which it reads the remaining “horizontal” blocks and recalculates the redundancy block in accordance with their data. That is, for one write operation, an array of 8 disks (7 data disks + 1 additional) will do 6 reads into the cache (the remaining data blocks from all disks to calculate the redundancy block), calculate the redundancy block from these blocks, and make 2 writes (writing a block of data to be written and overwriting a block of redundancy). In modern systems, some of the spice is removed due to caching, but nevertheless, the lengthening of the RAID-5 group, although it causes a proportional increase in read speed, but also a corresponding decrease in write speed.
The situation with the decrease in performance when writing to RAID-5 sometimes gives rise to curious extremism, for example, http://www.baarf.com/ ;)

However, since RAID-5 is the most efficient RAID structure in terms of disk consumption per megabyte, it is widely used where write speed reduction is not a decisive parameter, for example, for long-term data storage or for data that is predominantly read.
Separately, it should be mentioned that expanding a RAID-5 disk array by adding an additional disk causes a complete recalculation of the entire RAID, which can take hours, and in some cases days, during which the performance of the array drops catastrophically.

RAID-6
Further development of the RAID-5 idea. If we calculate additional redundancy according to a law other than that used in RAID-5, then we will be able to maintain data access if two disks in the array fail.
The price for this is an additional disk for the data of the second “redundancy block”. That is, to store data equal to the volume of N disks, we will need to take N + 2 disks. The “mathematics” of calculating redundancy blocks becomes more complicated, which causes an even greater decrease in write speed compared to RAID-5, but reliability increases. And in some cases it even exceeds the reliability level of RAID-10. It is easy to see that RAID-10 also withstands the failure of two disks in an array, however, if these disks belong to the same “mirror” or different, but not two mirrored disks. And the likelihood of just such a situation cannot be discounted.

A further increase in the numbers of RAID types occurs due to “hybridization”, so there are RAID-0 + 1, which has already been considered RAID-10, or all sorts of chimerical RAID-51, and so on.
Fortunately, they do not occur in wildlife, usually remaining a “sleep of the mind” (well, except for the RAID-10 already described above).

Hello. Today I got two brand new hard drives in my hands, I thought for a long time what can be done with them in order to help my readers. On reflection, I nevertheless decided that I could hardly write anything better than the story about RAID 1 created by the operating system itself. So what is RAID 1?

RAID 1 is an array of two disk media, the information on which is duplicated on both disks. That is, you have two disks that are complete copies of each other. What is it for? First of all, for hobbies of reliability of information storage. Since the probability of failure of both drives at the same time is small, if one drive fails, you will always have a copy of all the information on the second one. On a RAID 1 array, you can store any information like on a regular hard drive, which allows you not to worry about an important project that you have been working on for a very long time.

Today we will look at how a RAID array is created using Windows itself when using two empty disks (I can confidently declare that this instruction works on Windows 7, 8 and 8.1). If you are interested in creating a RAID array using an already full disk, then you need on this topic.

And, in fact, the instruction for your acquaintance:

1) First, install the hard drives in the system unit and start the computer.

2) Open "Control Panel → System and Security → Administrative Tools → Computer Management → Storage Devices → Disk Management". When you turn it on for the first time, the utility will inform you about the installation of new disk devices and prompt you to select the layout for them. If you have a disk of 2.2 TB or more, choose GPT, if less, then MBR.

3) At the bottom of the window we find one of our new hard drives and right-click on it. Select "Create Mirrored Volume":

4) The image creation wizard will open. Let's go further.

5) On this page, you need to add a disk that will duplicate the previously selected disk. Therefore, select the disk on the left side and click the "Add" button:

Let's go further.

6) Select the letter that will be used to designate the new volume. I chose M (for Mirror). We press next.

7) Set the file system, cluster size and volume name. I also recommend checking the box next to "Quick Formatting", let it do everything at once. And again further.

8) Check what we got, if everything is correct, click "Finish".

RAID (Redundant Array of Independent Disks)- a redundant array of independent disks, i.e. combining physical hard drives into one logical one to solve any problems. Most likely, you will use it for fault tolerance. If one of the disks fails, the system will continue to work. In the operating system, the array will look like a regular HDD. RAID- arrays originated in the server solutions segment, but are now widely used and are already being used at home. To manage the RAID, a special chip with intelligence is used, which is called a RAID controller. This is either a chipset on the motherboard, or a separate external board.

Types of RAID arrays

Hardware- this is when a special microcircuit controls the state of the array. The chip has its own CPU and all calculations fall on it, freeing the server CPU from unnecessary load.

Program- this is when a special program in the OS controls the state of the array. In this case, an additional load on the server CPU will be created. After all, all calculations fall on him.

It is impossible to say unequivocally which type of raid is better. In the case of a software raid, we do not need to buy an expensive raid controller. Which usually costs from $250. (can be found for $70, but I wouldn't risk the data) But all the calculations fall on the server's CPU. Software

the implementation is well suited for raids 0 and 1. They are quite simple and do not require large calculations to work. Therefore, software raids are more often used in entry-level solutions. A hardware raid uses a raid controller in its work. The raid controller has its own processor for calculations, and it is he who performs I / O operations.

RAID levels

There are enough of them. These are the main ones - 0, 1, 2, 3, 4, 5, 6, 7 and the combined ones - 10, 30, 50, 53 ... We will consider only the most popular ones that are used in modern enterprise infrastructure. The letter D in the diagrams means Data (data), or data block.

RAID 0 (Striped Disk Array without Fault Tolerance)

He is stripe. This is when two or more physical drives are merged into one logical drive for the purpose of consolidating space. That is, we take two disks of 500 GB each, combine them into RAID 0 and in the system we see 1 HDD with a capacity of 1 TB. The information is distributed across all the raid disks evenly in the form of small blocks (stripes).

Pros – High performance, ease of implementation.

Cons - lack of fault tolerance. When using this raid, the reliability of the system is halved (if we use two disks). After all, if at least one disk fails, you lose all data.

RAID 1 (Mirroring & Duplexing)

He is a mirror. This is when two or more physical disks are combined into one logical disk in order to increase fault tolerance. Information is written to both disks of the array at once, and when one of them exits, the information is stored on the other.

Pros - high read / write speed, ease of implementation.

Cons - high redundancy. In the case of using 2 disks, this is 100%.

RAID 1E

RAID 1E works like this: three physical disks are combined into an array, after which a logical volume is created. Data is distributed across disks in blocks. A data chunk (strip) marked with ** is a copy of the previous chunk *. In this case, each block of the mirror copy is written with a shift onto one disk

The simplest fault-tolerant solution to implement is RAID 1 (mirroring), a mirror image of two disks. High data availability is guaranteed by having two full copies. Such redundancy of the array structure affects its cost - after all, the useful capacity is half that of the used one. Since RAID 1 is built on two HDDs, this is clearly not enough for modern, disk space-hungry applications. Due to such requirements, the scope of RAID 1 is usually limited to service volumes (OS, SWAP, LOG), they are used to accommodate user data only in low-budget solutions.

RAID 1E is a combination of disk striping from RAID 0 and mirroring from RAID 1. Simultaneously with writing a data area to one drive, a copy is created on the next disk in the array. The difference from RAID 1 is that the number of HDDs can be odd (minimum 3). As with RAID 1, the usable capacity is 50% of the array's total drive capacity. True, if the number of disks is even, it is preferable to use RAID 10, which, with the same capacity utilization, consists of two (or more) "mirrors". If one of the RAID 1E drives physically fails, the controller switches read and write requests to the remaining drives in the array.

Advantages:

high data security;
good performance.

Disadvantages:

as with RAID 1, only 50% of the array's disk capacity is used.

RAID 2

In arrays of this type, disks are divided into two groups - for data and for error correction codes, and if data is stored on disks, then disks are needed to store correction codes. Data is written to the corresponding disks in the same way as in RAID 0, they are divided into small blocks according to the number of disks intended for storing information. The remaining disks store error correction codes, according to which, in the event of a hard disk failure, information recovery is possible. The Hamming method has long been used in ECC memory and allows you to correct single errors and detect double errors on the fly.

The disadvantage of a RAID 2 array is that it requires a structure of almost double the number of disks to function, so this type of array has not gained popularity.

RAID 3

In a RAID 3 array of disks, data is broken into chunks smaller than a sector (split into bytes) or a block and distributed across the disks. Another disk is used to store parity blocks. In RAID 2, a disk was used for this purpose, but most of the information on the control disks was used for on-the-fly error correction, while most users are satisfied with the simple recovery of information in the event of a disk failure, for which there is enough information that fits on one dedicated hard disk.

Differences between RAID 3 and RAID 2: the impossibility of error correction on the fly and less redundancy.

Advantages:

high speed reading and writing data;
The minimum number of disks to create an array is three.

Disadvantages:

an array of this type is good only for single-tasking work with large files, since the access time to a separate sector, divided by disks, is equal to the maximum of the access intervals to the sectors of each of the disks. For small block sizes, the access time is much longer than the read time.
a large load on the control disk, and, as a result, its reliability drops significantly compared to disks that store data.

RAID 4

RAID 4 is similar to RAID 3, but differs in that data is broken into blocks rather than bytes. Thus, it was possible to partially “win” the problem of low data transfer rate of a small amount. Writes are slow due to the fact that parity for a block is generated during writes and written to a single disk. Of the widespread storage systems, RAID-4 is used on NetApp storage devices (NetApp FAS), where its shortcomings have been successfully eliminated by operating disks in a special group write mode determined by the internal WAFL file system used on devices.

RAID 5 (Independent Data Disks with Distributed Parity Blocks)

The most popular type of raid array, in general, due to the cost-effectiveness of using storage media. Data blocks and checksums are cyclically written to all drives in the array. If one of the disks fails, performance will be noticeably reduced, since additional manipulations will have to be performed for the array to function. The raid itself has a fairly good read / write speed, but is slightly inferior to RAID 1. You need at least three disks to organize RAID 5.

Pros - economical use of media, good read / write speed. The performance difference compared to RAID 1 is not as noticeable as the disk space savings. In the case of using three HDDs, the redundancy is only 33%.

Cons - complex data recovery and implementation.

RAID 5E

RAID 5E works like this. An array is assembled from four physical disks, and a logical disk is created in it. The allocated spare disk is free space. Data is distributed across drives, creating blocks on a logical disk. The checksums are also distributed across the disks of the array and are written with a disk-to-disk shift, as in RAID 5. The spare HDD remains empty.

"Classic" RAID 5 has been considered the standard for fault tolerance for disk subsystems for many years. It uses data distribution (striping) over the HDD of the array, for each of the portions (stripe) defined in it, checksums (parity, parity) are calculated and written. Accordingly, the write speed is reduced due to the constant recalculation of the COP with the arrival of new data. To increase performance, CS writes are distributed across all array drives, interleaved with data. CS storage consumes the capacity of one media, so RAID 5 uses one disk less than the total number of disks in the array. RAID 5 requires a minimum of three (and a maximum of 16) hard drives, and its disk space utilization efficiency ranges from 67% to 94% depending on the number of disks. Obviously, this is more than RAID 1, which utilizes 50% of the available capacity.

The small overhead for implementing RAID 5 redundancy results in a rather complicated implementation and a lengthy data recovery process. The calculation of checksums and addresses is assigned to the hardware RAID controller with high requirements for its processor, logic and cache memory. The performance of a RAID 5 array in its degraded state is extremely poor, and the recovery time is measured in hours. As a result, the problem of array inferiority is exacerbated by the risk of repeated failure of one of the disks before the moment when the RAID is restored. This results in the destruction of the data volume.

A common approach is to include a dedicated hot-spare disk (hot-spare) in RAID 5 to reduce downtime before physical replacement of a failed disk. After one of the drives in the original array fails, the controller includes the spare drive in the array and begins the process of rebuilding the RAID. It is important to clarify that prior to this first failure, the spare drive is idling, may not participate in the operation of the array for years, and may not be checked for surface errors. As well as the one that will later be brought under warranty replacement instead of the failed one, inserted into the disk basket and assigned as a backup. A big surprise could be its inoperability, and it turns out at the most inopportune moment.

RAID 5E is RAID 5 with a permanent hot-spare disk included in the array, the capacity of which is added equally to each element of the array. RAID 5E requires a minimum of four HDDs. Like RAID 5, data and checksums are striped across the drives in the array. The useful capacity utilization of RAID 5E is slightly lower, but the performance is higher than that of RAID 5 with hot-spare.

The capacity of a RAID 5E logical volume is less than the total capacity by the capacity of two media (the capacity of one goes for checksums, the second one goes for hot-spare). But reading and writing to four physical RAID 5E devices is faster than operations with three physical RAID 5 drives with a classic hot-spare (while the fourth, hot-spare, does not take part in the work). The spare drive in RAID 5E is a full permanent member of the array. It cannot be assigned as a backup to two different arrays ("a servant of two masters" - as it is allowed in RAID 5).

If one of the physical drives fails, the data from the failed drive is restored. The array is compressed and the allocated spare becomes part of the array. The logical drive remains at RAID 5E. After the failed disk is replaced with a new one, the logical disk data is reversed to the original state of the HDD distribution scheme. When using a RAID 5E logical disk in failover cluster schemes, it will not perform its functions during data compression-decompression.

Advantages:

high data security;
usable capacity utilization is higher than RAID 1 or RAID 1E;
better performance than RAID 5.

Disadvantages:

performance is lower than RAID 1E;
cannot share a spare drive with other arrays.

RAID 5EE

Note: Not supported on all controllers RAID level-5EE is similar to a RAID-5E array, but with more efficient use of the spare drive and faster recovery time. Similar to RAID level-5E, this RAID level creates data and checksum rows across all drives in the array. The RAID-5EE array has improved security and performance. When using RAID level-5E, the capacity of a logical volume is limited to the capacity of two physical hard drives in the array (one for control, one for backup). The spare drive is part of a RAID level-5EE array. However, unlike RAID level-5E, which uses unshared free space for the spare, RAID level-5EE has checksum blocks inserted into the spare disk, as shown in the following example. This allows you to quickly rebuild data in the event of a physical disk failure. With this configuration, you will not be able to use it with other arrays. If you need a spare drive for another array, you should have another spare hard drive. RAID level-5E requires a minimum of four drives and, depending on the firmware level and their capacity, supports 8 to 16 drives. RAID level-5E has a specific firmware. Note: For RAID level-5EE, you can only use one logical volume per array.

Advantages:

100% data protection
Large physical disk capacity compared to RAID-1 or RAID-1E
Greater performance than RAID-5
Faster RAID recovery than RAID-5E

Disadvantages:

Lower performance than RAID-1 or RAID-1E
Support for only one logical volume per array
Inability to share a spare drive with other arrays
Not all controllers supported

RAID 6

RAID 6 - similar to RAID 5, but has a higher degree of reliability - the capacity of 2 disks is allocated for checksums, 2 sums are calculated using different algorithms. Requires a more powerful RAID controller. Provides operability after simultaneous failure of two disks - protection against multiple failure. A minimum of 4 disks is required to organize an array. Typically, using RAID-6 causes about a 10-15% drop in disk group performance compared to RAID-5, which is caused by a large amount of processing for the controller (the need to calculate a second checksum, as well as read and rewrite more disk blocks on each write). block).

RAID 7

RAID 7 is a registered trademark of Storage Computer Corporation and is not a separate RAID level. The array structure is as follows: data is stored on disks, one disk is used to store parity blocks. Writing to disks is cached using RAM, the array itself requires a mandatory UPS; in the event of a power failure, data is corrupted.

RAID 10 or RAID 1+0 (Very High Reliability with High Performance)

A combination of a mirrored raid and a striped disk raid. In this type of raid, disks are combined in pairs into mirrored raids (RAID 1) and then all these mirrored pairs are combined into a striped array (RAID 0). Only an even number of disks can be combined into a raid, minimum - 4, maximum - 16. From RAID 1 we inherit reliability, from RAID 0 - speed.

Pros – high fault tolerance and performance

Cons - high cost

RAID 50 or RAID 5+0 (High I/O Rates & Data Transfer Performance)

It is also RAID 50, this is a combination of RAID 5 and RAID 0. The array combines high performance and fault tolerance.

Pros - high fault tolerance, data transfer speed and query execution

Cons - high cost

RAID 60

RAID level 60 combines characteristics from levels 6 and 0. RAID 60 combines direct block-level striping in RAID 0 with distributed double parity in RAID 6, namely: RAID 0 arrays are distributed among RAID 6 elements. A RAID 60 virtual disk can survive the loss of two hard drives in each of the RAID 6 installs without data loss. It is most efficient with data that needs high reliability, high request rates, high data transfer, and medium to large capacity. The minimum number of disks is 8.

Linear RAID

Linear RAID is a simple grouping of disks that creates a large virtual disk. In linear RAID, blocks are allocated first on one disk included in the array, then, if this one is full, on another, and so on. Such consolidation does not give a performance gain, since most likely the I / O operations will not be distributed between disks. Linear RAID also contains no redundancy and in fact increases the chance of failure - if just one drive fails, the entire array will fail. The capacity of the array is equal to the total capacity of all disks.

The main conclusion that can be drawn is that each level of the raid has its pros and cons.

More importantly, the conclusion is that a raid does not guarantee the integrity of your data. That is, if someone deletes the file or it is damaged by some process, the raid will not help us. Therefore, the raid does not exempt us from the need to make backups. But it helps when there are problems with disks at the physical level.

All about RAID arrays from hard drives (HDD). RAID array

What are RAIDs?

Creating an array

Noisy RAID Cost Benefits

What is a RAID array?

RAID 0, what is it?

RAID 1, what is it?

RAID 2-6, what is it?

Which RAID array to choose?

Output

Types of RAID arrays

RAID levels

RAID 0 (Striped Disk Array without Fault Tolerance)

RAID 1 (Mirroring & Duplexing)

RAID 1E

RAID 2

RAID 3

RAID 4

RAID 5 (Independent Data Disks with Distributed Parity Blocks)

RAID 5E

RAID 5EE

RAID 6

RAID 7

RAID 10 or RAID 1+0 (Very High Reliability with High Performance)

RAID 50 or RAID 5+0 (High I/O Rates & Data Transfer Performance)

RAID 60

Linear RAID

Top Related Articles