Hi all, sorry for the top posting. In a previous message, you explain the "Global Parity" would be the xor of all the data across the stripes, including the stripes parities. Is this still the case? Did I miss something? Because, by definition, the xor between the data and the parity, in a stripe, is always 0. Hence, the xor of all stripes data and parities is 0 too, always, and so it is *not* necessary to store it. It is only necessary to check it, if wanted. Now, again, maybe I skipped some parts, so I apologize in advance if this is the case and what is written above is just rubbish, otherwise something is not really correct in the intepretation of the cited paper. bye, pg On Wed, Jan 31, 2018 at 07:33:54PM +0330, mostafa kishani wrote: > Yes that's exactly what the code does. Here the math of > encoding/decoding is not as important of IO overhead. Upon a stripe > update, it needs to update the Global parity as well (that is probably > in another stripe). This should result in a terrible performance in > random-write workloads. But in sequential-write workloads this code > may have a performance near to RAID5 and slightly better than RAID6. > The 2D codes (as you suggested) also suffer a huge IO penalty and this > is why the're barely employed even is fast memory structure such as > SRAM/DRAM. > > Bests, > Mostafa > > On Tue, Jan 30, 2018 at 6:44 PM, David Brown <david.brown@xxxxxxxxxxxx> wrote: > > On 30/01/18 12:30, mostafa kishani wrote: > >> David what you pointed about employment of PMDS codes is correct. We > >> have no access to what happens in the SSD firmware (such as FTL). But > >> why this code cannot be implemented in the software layer (similar to > >> RAID5/6...) ? I also thank you for pointing out very interesting > >> subjects. > >> > > > > I must admit that I haven't dug through the mathematical details of the > > paper. It looks to be at a level that I /could/ understand, but would > > need to put in quite a bit of time and effort. And the paper does not > > strike me as being particularly outstanding or special - there are many, > > many such papers published about new ideas in error detection and > > correction. > > > > While it is not clear to me exactly how these additional "global" parity > > blocks are intended to help correct errors in the paper, I can see a way > > to handle it. > > > > d d d d d P > > d d d d d P > > d d d d d P > > d d d S S P > > > > Where the "d" blocks are normal data blocks, "P" are raid-5 parity > > blocks (another column for raid-6 Q blocks could be added), and "S" are > > these "global" parity blocks. > > > > If a row has more errors than the normal parity block(s) can correct, > > then it is possible to use wider parity blocks to help. If you have one > > S that is defined in the same way as raid-6 Q parity, then it can be > > used to correct an extra error in a stripe. That relies on all the > > other stripes having at most P-correctable errors. > > > > The maths gets quite hairy. Two parity blocks are well-defined at the > > moment - raid-5 (xor) and raid-6 (using powers of 2 weights on the data > > blocks, over GF(8)). To provide recovery here, the S parities would > > have to fit within the same scheme. A third parity block is relatively > > easy to calculate using powers of 4 weights - but that is not scalable > > (a fourth parity using powers of 8 does not work beyond 21 data blocks). > > An alternative multi-parity scheme is possible using significantly more > > complex maths. > > > > However it is done, it would be hard. I am also not convinced that it > > would work for extra errors distributed throughout the block, rather > > than just in one row. > > > > A much simpler system could be done using vertical parities: > > > > d d d d d P > > d d d d d P > > d d d d d P > > V V V V V P > > > > Here, the V is just a raid-5 parity of the column of blocks. You now > > effectively have a raid-5-5 layered setup, but distributed within the > > one set of disks. Recovery would be straight-forward - if a block could > > not be re-created from a horizontal parity, then the vertical parity > > would be used. You would have some write amplification, but it would > > perhaps not be too bad (you could have many rows per vertical parity > > block), and it would be fine for read-mostly applications. It bears a > > certain resemblance to raid-10 layouts. Of course, raid-5-6, raid-6-5 > > and raid-6-6 would also be possible. > > > > > >>> > >>> Other things to consider on big arrays are redundancy of controllers, or > >>> even servers (for SAN arrays). Consider the pros and cons of spreading your > >>> redundancy across blocks. For example, if your server has two controllers > >>> then you might want your low-level block to be Raid-1 pairs with one disk on > >>> each controller. That could give you a better spread of bandwidths and give > >>> you resistance to a broken controller. > >>> > >>> You could also talk about asymmetric raid setups, such as having a > >>> write-only redundant copy on a second server over a network, or as a cheap > >>> hard disk copy of your fast SSDs. > >>> > >>> And you could also discuss strategies for disk replacement - after failures, > >>> or for growing the array. > >> > >> The disk replacement strategy has a significant effect on both > >> reliability and performance. The occurrence of human errors in desk > >> replacement can result in data unavailability and data loss. In the > >> following paper I've briefly discussed this subject and how a good > >> disk replacement policy can improve reliability by orders of magnitude > >> (a more detailed version of this paper is on the way!): > >> https://dl.acm.org/citation.cfm?id=3130452 > > > > In my experience, human error leads to more data loss than mechanical > > errors - and you really need to take it into account. > > > >> > >> you can download it using sci-hub if you don't have ACM access. > >> > >>> > >>> It is also worth emphasising that RAID is /not/ a backup solution - that > >>> cannot be said often enough! > >>> > >>> Discuss failure recovery - how to find and remove bad disks, how to deal > >>> with recovering disks from a different machine after the first one has died, > >>> etc. Emphasise the importance of labelling disks in your machines and being > >>> sure you pull the right disk! > >> > >> I really appreciate if you can share your experience about pulling > >> wrong disk and any statistics. This is an interesting subject to > >> discuss. > >> > > > > My server systems are too small in size, and too few in numbers, for > > statistics. I haven't actually pulled the wrong disk, but I did come > > /very/ close before deciding to have one last double-check. > > > > I have also tripped over the USB wire to an external disk and thrown it > > across the room - I am now a lot more careful about draping wires around! > > > > > > mvh., > > > > David > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html