On Thursday October 20, sgunderson@xxxxxxxxxxx wrote: > On Mon, Oct 17, 2005 at 08:55:45AM +1000, Neil Brown wrote: > > I'll have a close look at all the code sometime today and get back to > > you with any comments. > > Any progress? > Ok, I've had another fairly detailed look.. I'd like to suggest another simplification (at least, I think it's a simplification, let me know what you think). While working on expanding one section, you currently gave a bunch of stripe_heads (expand_stripes) sitting idle, waiting for all the reads to complete, and you also have a collection of pages (expand_buffer) to copy the data into when it is read. I think you can combine these two, so that the expand_stripes are less idle (i.e. their buffer space gets used) and so that expand_buffer isn't needed. I would first include the "sh->disks" value in the 'key' used to look stripe_heads up in the hash table. That way you could easily have 'old' stripes and 'new' stripes possibly for overlapping regions living in the hash table at the same time. I would change sync_request to work in units of chunk_size rather than STRIPE_SECTORS, as I think that makes it a bit easier too (sync_request can schedule as much sync activity as it like - it just has to return the correct number of sectors). So what sync_request would do would be: 1/ get_active_stripe a collection of stripes with sh->disks being the new size, and flag all of them as being in an expand, and set the sh->count to be the number of blocks that need to be loaded into the stripe. Also set STRIPE_HANDLE so as soon as the count reaches 0, the stripe will be handled. (Actually, you probably wouldn't use get_active_stripe, you would use get_free_stripe, and then init_stripe). 2/ advance 'expand_progress' to the end of this set of stripes 3/ get_active_stripe all of the stripes (with old size in sh->disks) that need to be read to fill in the new set of stripes, and flag them so that handle_stripe will read all the blocks. make_request will now find the new-sized stripes when looking for blocks in this area of the array. It will not find the old stripes, so we know that no new IO requests will be added to old stripes in this region of the array. If make_request finds a stripe that is flagged as being in an expand, then it should block until the expand moves onward (wait_for_expand_progress). In handle_stripe, if we get a stripe that - was flagged for reading prior to expand and - has all the required blocks read in and - has no pending IO requests in the region of the current expand then - transfer (either memcpy or pointer fiddling) the data into the stripes that are waiting for the data, and decrement the ->count for those stripes by the number of blocks that were transferred. When handle stripe gets a stripe that is flagged for expanding, then it knows that all the data has been transferred in, and it updates the parity block and schedules a write. I think that with that change in place, I'll be happy with the structure and will then start a closer review of the fine detail and help you find the corruption bug (if it still exists). Thanks for your continuing efforts. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html