On Sat, 30 Mar 2002, Neil Brown wrote: > On Friday March 29, dean-list-linux-kernel@arctic.org wrote: > > 2.4.19-pre3-ac1 > > > > /dev/md1 is a raid5 of 4 disks > > [creating snapshot volumes] results in lots of log spam of the form: > > > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 512 > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 4096 > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 4096 --> 512 > > The current raid5 requires all requests that is recieves to be the > same size. If it gets requests of different sizes, it does cope but > it needs to flush its stripe cache and start again. This effectively > serialises all requests around a request-size change. > So it is probably fair to say that what you are trying to do is > currently not a supported option. It may be in 2.6 if we ever get > raid5 working again in 2.5 :-) this is an operational concern to me now -- because even the act of creating an LVM snapshot results in the log spam... and this makes my machine grind to a halt if i attempt to do backups with snapshots. i could pursue "fixing" LVM, mke2fs/etc. to all use 4096-byte buffer_heads if possible. but i thought i'd also look into fixing md itself... is anyone working on this already? i've thought of two solutions, let me know what you think. (1) in raid5_make_request, do something like: if (bh->b_size <= conf->buffer_size || conf->buffer_size == 0) { do what's in the code currently; } else { break up the request into (bh->b_size / conf->buffer_size) pieces; } this would mean i'd eventually end up with conf->buffer_size == 512 and my regular 4096 byte fs operations would end up being broken into 8 pieces... as long as LVM is doing the smaller block size requests. and if i see things right, eventually a sync would cause the cache to 0, and at that point it would be possible to get back to a 4096-byte cache. how hard is it to break up bh things like this? i'm a newbie in this area of the kernel :) (2) the differing size accesses are typically in disjoint regions of the array. for example, when creating an LVM snapshot there is LVM metadata, there is a live filesystem, and there is the LVM snapshot volume itself. all of these are disjoint. instead of draining the entire cache in get_active_stripe(), we only need to drain entries which overlap the sector we're interested in. i'm still puzzling out an efficient change to the hash which accomplishes this... one idea is to change conf->buffer_size to conf->max_buffer_size. in this case we drain the entire cache any time max_buffer_size changes, and allow entries in the cache which are any size <= conf->max_buffer_size. if max_buffer_size is 4096 (as i expect it to be) then 512-byte accesses will potentially require hash chains 8x longer than they would be today... but this could be a total non-issue. i suspect it works for my situation, but may not work in general when folks want to mix and match filesystem or database block sizes on an array. the cache could use a tree instead of a hash ... but that's not so nice for n-way scaling :) thanks -dean - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html