On Thu, Sep 22 2016, Guoqing Jiang wrote: > On 09/21/2016 02:45 AM, Guoqing Jiang wrote: >> >> >> On 09/20/2016 02:31 PM, Anthony DeRobertis wrote: >>> Sorry for the amount of emails I'm sending, but I noticed something >>> that's probably important. I'm also appending some gdb log from >>> tracing through the function (trying to answer why it's doing cluster >>> mode stuff at all). >>> >>> While tracing through, I noticed that *before* the write-bitmap loop, >>> mdadm -E considers the superblock valid. That agrees with what I saw >>> from strace, I suppose. To my first glance, it figures out how much >>> to write by calling this function: >>> >>> static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned >>> int boundary) >>> { >>> unsigned long long bits, bytes; >>> >>> bits = __le64_to_cpu(bms->sync_size) / >>> (__le32_to_cpu(bms->chunksize)>>9); >>> bytes = (bits+7) >> 3; >>> bytes += sizeof(bitmap_super_t); >>> bytes = ROUND_UP(bytes, boundary); >>> >>> return bytes; >>> } >>> >>> That code looked familiar, and I figured out where—it's also in >>> 95a05b37e8eb2bc0803b1a0298fce6adc60eff16, the commit that I found >>> originally broke it. But that commit is making a change to it: it >>> changed the ROUND_UP line from 512 to 4096 (and from the gdb trace, >>> boundary==4096). >>> >>> I tested changing that line to "bytes = ROUND_UP(bytes, 512);", and >>> it works. Adds the new disk to the array and produces no warnings or >>> errors. >> >> I think it is is a coincidence that above change works, 4a3d29e >> commit made >> the change but it didn't change the logic at all. > > Hmm, seems bitmap is aligned to 512 in previous mdadm, but with commit > 95a05b3 > we made it aligned to 4k, so it causes the latest mdadm can't work with > previous > created array. > > Does the below change work? Thanks. > > diff --git a/super1.c b/super1.c > index 9f62d23..6a0b075 100644 > --- a/super1.c > +++ b/super1.c > @@ -2433,7 +2433,10 @@ static int write_bitmap1(struct supertype *st, > int fd, enum bitmap_update update > memset(buf, 0xff, 4096); > memcpy(buf, (char *)bms, sizeof(bitmap_super_t)); > > - towrite = calc_bitmap_size(bms, 4096); > + if (__le32_to_cpu(bms->nodes) == 0) > + towrite = calc_bitmap_size(bms, 512); > + else > + towrite = calc_bitmap_size(bms, 4096); > while (towrite > 0) { (sorry for the late reply ... travel, jetlag, ....) I think a better, simpler, fix is: > - towrite = calc_bitmap_size(bms, 4096); > + towrite = calc_bitmap_size(bms, 512); The only reason that we are rounding up here is that we are using O_DIRECT writes and they require 512-byte alignment. Any bytes beyond the end of the actual bitmap will be ignored, so it doesn't matter whether they are written or not. Current mdadm always aligns bitmaps on a 4K boundary, but older version of mdadm didn't. If the bitmap was less than 4K before the superblock (quite possible), writing 4K for bitmap would corrupt the superblock. This can certainly happen with 1.0 metadata. However ... the reason that everything is now 4K aligned is that some drives use a 4K block size. For those, we really should be doing 4K writes, not 512-byte writes. So it would make sense to round up to 4K sometimes, and use 512 at other times. However the correct test isn't whether cluster-raid is in use. The metadata has always been aligned on a 4K boundary. If data_offset and bblog_offset and bitmap_offset all have 4K alignment, then rounding up to 4K for the bitmap writes would be correct. If anything have a smaller alignment, then it isn't necessary and so should be avoided. So the best fix would be to test those 3 offsets, and round up to a multiple of 4096 only if all of them are on a 4K boundary. NeilBrown
Attachment:
signature.asc
Description: PGP signature