On Sunday April 28, dean@arctic.org wrote: > > > > > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 512 > > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 4096 > > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 4096 --> 512 > > > > this is an operational concern to me now -- because even the act of > creating an LVM snapshot results in the log spam... and this makes my > machine grind to a halt if i attempt to do backups with snapshots. > > i could pursue "fixing" LVM, mke2fs/etc. to all use 4096-byte buffer_heads > if possible. but i thought i'd also look into fixing md itself... is > anyone working on this already? > > i've thought of two solutions, let me know what you think. > > (1) in raid5_make_request, do something like: > > if (bh->b_size <= conf->buffer_size || conf->buffer_size == 0) { > do what's in the code currently; > } > else { > break up the request into > (bh->b_size / conf->buffer_size) pieces; > } > > this would mean i'd eventually end up with conf->buffer_size == 512 > and my regular 4096 byte fs operations would end up being broken > into 8 pieces... as long as LVM is doing the smaller block size > requests. > > and if i see things right, eventually a sync would cause the > cache to 0, and at that point it would be possible to get back to > a 4096-byte cache. > > how hard is it to break up bh things like this? i'm a newbie in > this area of the kernel :) Look at http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre3/patch-j-RaidSplit It allows each raid module to say "split this for me" and md.c will split the request and re-submit. It should then be fairly easy to get raid5 to say 'split' if the request is bigger than the current size. However doing raid5 with 512 by blocks seems to be a lot slower than with 4K blocks.. Try it and see. > > (2) the differing size accesses are typically in disjoint regions of > the array. for example, when creating an LVM snapshot there is LVM > metadata, there is a live filesystem, and there is the LVM snapshot > volume itself. all of these are disjoint. > > instead of draining the entire cache in get_active_stripe(), we only > need to drain entries which overlap the sector we're interested > in. and you need to make sure that new requests don't then conflict with other new requests... I suspect this approach would be quite hard. I think the "right" way is to fix the cache size at 4K and handle reads and writes from partial buffers. It's probably end up like this in 2.5.. eventually. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html