Yes that's exactly what the code does. Here the math of encoding/decoding is not as important of IO overhead. Upon a stripe update, it needs to update the Global parity as well (that is probably in another stripe). This should result in a terrible performance in random-write workloads. But in sequential-write workloads this code may have a performance near to RAID5 and slightly better than RAID6. The 2D codes (as you suggested) also suffer a huge IO penalty and this is why the're barely employed even is fast memory structure such as SRAM/DRAM. Bests, Mostafa On Tue, Jan 30, 2018 at 6:44 PM, David Brown <david.brown@xxxxxxxxxxxx> wrote: > On 30/01/18 12:30, mostafa kishani wrote: >> David what you pointed about employment of PMDS codes is correct. We >> have no access to what happens in the SSD firmware (such as FTL). But >> why this code cannot be implemented in the software layer (similar to >> RAID5/6...) ? I also thank you for pointing out very interesting >> subjects. >> > > I must admit that I haven't dug through the mathematical details of the > paper. It looks to be at a level that I /could/ understand, but would > need to put in quite a bit of time and effort. And the paper does not > strike me as being particularly outstanding or special - there are many, > many such papers published about new ideas in error detection and > correction. > > While it is not clear to me exactly how these additional "global" parity > blocks are intended to help correct errors in the paper, I can see a way > to handle it. > > d d d d d P > d d d d d P > d d d d d P > d d d S S P > > Where the "d" blocks are normal data blocks, "P" are raid-5 parity > blocks (another column for raid-6 Q blocks could be added), and "S" are > these "global" parity blocks. > > If a row has more errors than the normal parity block(s) can correct, > then it is possible to use wider parity blocks to help. If you have one > S that is defined in the same way as raid-6 Q parity, then it can be > used to correct an extra error in a stripe. That relies on all the > other stripes having at most P-correctable errors. > > The maths gets quite hairy. Two parity blocks are well-defined at the > moment - raid-5 (xor) and raid-6 (using powers of 2 weights on the data > blocks, over GF(8)). To provide recovery here, the S parities would > have to fit within the same scheme. A third parity block is relatively > easy to calculate using powers of 4 weights - but that is not scalable > (a fourth parity using powers of 8 does not work beyond 21 data blocks). > An alternative multi-parity scheme is possible using significantly more > complex maths. > > However it is done, it would be hard. I am also not convinced that it > would work for extra errors distributed throughout the block, rather > than just in one row. > > A much simpler system could be done using vertical parities: > > d d d d d P > d d d d d P > d d d d d P > V V V V V P > > Here, the V is just a raid-5 parity of the column of blocks. You now > effectively have a raid-5-5 layered setup, but distributed within the > one set of disks. Recovery would be straight-forward - if a block could > not be re-created from a horizontal parity, then the vertical parity > would be used. You would have some write amplification, but it would > perhaps not be too bad (you could have many rows per vertical parity > block), and it would be fine for read-mostly applications. It bears a > certain resemblance to raid-10 layouts. Of course, raid-5-6, raid-6-5 > and raid-6-6 would also be possible. > > >>> >>> Other things to consider on big arrays are redundancy of controllers, or >>> even servers (for SAN arrays). Consider the pros and cons of spreading your >>> redundancy across blocks. For example, if your server has two controllers >>> then you might want your low-level block to be Raid-1 pairs with one disk on >>> each controller. That could give you a better spread of bandwidths and give >>> you resistance to a broken controller. >>> >>> You could also talk about asymmetric raid setups, such as having a >>> write-only redundant copy on a second server over a network, or as a cheap >>> hard disk copy of your fast SSDs. >>> >>> And you could also discuss strategies for disk replacement - after failures, >>> or for growing the array. >> >> The disk replacement strategy has a significant effect on both >> reliability and performance. The occurrence of human errors in desk >> replacement can result in data unavailability and data loss. In the >> following paper I've briefly discussed this subject and how a good >> disk replacement policy can improve reliability by orders of magnitude >> (a more detailed version of this paper is on the way!): >> https://dl.acm.org/citation.cfm?id=3130452 > > In my experience, human error leads to more data loss than mechanical > errors - and you really need to take it into account. > >> >> you can download it using sci-hub if you don't have ACM access. >> >>> >>> It is also worth emphasising that RAID is /not/ a backup solution - that >>> cannot be said often enough! >>> >>> Discuss failure recovery - how to find and remove bad disks, how to deal >>> with recovering disks from a different machine after the first one has died, >>> etc. Emphasise the importance of labelling disks in your machines and being >>> sure you pull the right disk! >> >> I really appreciate if you can share your experience about pulling >> wrong disk and any statistics. This is an interesting subject to >> discuss. >> > > My server systems are too small in size, and too few in numbers, for > statistics. I haven't actually pulled the wrong disk, but I did come > /very/ close before deciding to have one last double-check. > > I have also tripped over the USB wire to an external disk and thrown it > across the room - I am now a lot more careful about draping wires around! > > > mvh., > > David > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html