On Mon, 16 Jul 2012 17:53:53 -0500 Brassow Jonathan <jbrassow@xxxxxxxxxx> wrote: > > On Jul 16, 2012, at 3:28 AM, keld@xxxxxxxxxx wrote: > > >> > >> Maybe you are suggesting that dmraid should not support raid10-far or > >> raid10-offset until the "new" approach is implemented. > > > > I don't know. It may take a while to get it implemented as long as no seasoned > > kernel hackers are working on it. As it is implemented now by Barrow, why not then go > > forward as planned. > > > > For the offset layout I don't have a good idea on how to improve the redundancy. > > Maybe you or others have good ideas. Or is the offset layout an implementation > > of a standard layout? Then there is not much ado. Except if we could find a layout that has > > the same advantages but with better redundancy. > > Excuse me, s/Barrow/Brassow/ - my parents insist. > > I've got a "simple" idea for improving the redundancy of the "far" algorithms. Right now, when calculating the device on which the far copy will go, we perform: > d += geo->near_copies; > d %= geo->raid_disks; > This effectively "shifts" the copy rows over by 'near_copies' (1 in the simple case), as follows: > disk1 disk2 or disk1 disk2 disk3 > ===== ===== ===== ===== ===== > A1 A2 A1 A2 A3 > .. .. .. .. .. > A2 A1 A3 A1 A2 > For all odd numbers of 'far' copies, this is what we should do. However, for an even number of far copies, we should shift "near_copies + 1" - unless (far_copies == (raid_disks / near_copies)), in which case it should be simply "near_copies". This should provide maximum redundancy for all cases, I think. I will call the number of devices the copy is shifted the "device stride", or dev_stride. Here are a couple examples: > 2-devices, near=1, far=2, offset=0/1: dev_stride = nc (SAME AS CURRENT ALGORITHM) > > 3-devices, near=1, far=2, offset=0/1: dev_stride = nc + 1. Layout changes as follows: > disk1 disk2 disk3 > ===== ===== ===== > A1 A2 A3 > .. .. .. > A2 A3 A1 > > 4-devices, near=1, far=2, offset=0/1: dev_stride = nc + 1. Layout changes as follows: > disk1 disk2 disk3 disk4 > ===== ===== ===== ===== > A1 A2 A3 A4 > .. .. .. .. > A3 A4 A1 A2 Hi Jon, This looks good for 4 devices, but I think it breaks down for e.g. 6 devices. I think a useful measure is how many different pairs of devices exist such that when both fail we lose data (thinking of far=2 layouts only). We want to keep this number low. Call it the number of vulnerable pairs. With the current layout with N devices, there are N pairs that are vulnerable. (x and x+1 for each x). If N==2, the two pairs are 0,1 and 1,0. These pairs are identical so there is only one vulnerable pair. With your layout there are still N pairs (x and x+2) except when there are 4 devices (N=2), we get 0,2 1,3 2,0 3,1 in which case 2 sets of pairs are identical (1,3 == 3,1 and 2,4==4,2). With N=6 the 6 pairs are 0,2 1,3 2,4 3,5 4,0 5,1 and no two pairs are identical. So there is no gain. The layout with data stored on device 'x' is mirrored on device 'x^1' has N/2 pairs which are vulnerable. An alternate way to gain this low level of vulnerability would be to mirror data on X onto 'X+N/2' This is the same as your arrangement for N==4. For N==6 it would be: A B C D E F G H I J K L .... D E F A B C J K L G H I ... so the vulnerable pairs are 0,3 1,4 2,5 This might be slightly easier to implement (as you suggest: have a dev_stride, only set it to raid_disks/fc*nc). > > This should require a new bit in 'layout' (bit 17) to signify a different calculation in the way the copy device selection happens. We then need to replace 'd += geo->near_copies' with 'd += geo->dev_stride' and set dev_stride in 'setup_geo'. I'm not certain how much work it is beyond that, but I don't *think* it looks that bad and I'd be happy to do it. I'm tempted to set bit 31 to mean "bits 0xFF are number of copies and bits 0xFF00 define the layout of those copies".. but just adding a bit17 probably makes more sense. If you create and test a patch using the calculation I suggested, I'll be happy to review it. > > So, should I allow the current "far" and "offset" in dm-raid, or should I simply allow "near" for now? That's up to you. However it might be sensible not to rush into supporting the current far and offset layouts until this conversation has run its course. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature