On Mon, 12 Aug 2013 10:24:37 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote: > get_active_stripe() is the last place we have lock contention. It has two > paths. One is stripe isn't found and new stripe is allocated, the other is > stripe is found. Hi Shaohua Li, thanks for the patch. I think it is a good idea but it needs more work. But first we will need to fix some bugs ... in md.c and in your patch. > > The first path basically calls __find_stripe and init_stripe. It accesses > conf->generation, conf->previous_raid_disks, conf->raid_disks, > conf->prev_chunk_sectors, conf->chunk_sectors, conf->max_degraded, > conf->prev_algo, conf->algorithm, the stripe_hashtbl and inactive_list. Except > stripe_hashtbl and inactive_list, other fields are changed very rarely. Yes, those fields don't change very often, but our current locking doesn't properly protect against them changing. In particular in "make_request()", if raid5_start_reshape() changes these fields between the point where reshape_progress is seen to be MaxSector, and where get_active_stripe() is called, get_active_stripe will return the wrong stripe. I think we should probably introduce a seqlock to protect these fields. It is very cheap to get a read-lock on a seqlock so we can do that every time we enter make_request. Then get_active_stripe wouldn't need to worry about device_lock at all and would only need to get the hash lock for the particular sector. That should make it a lot simpler. Also your new shrink_stripes() and similar code in resize_stripes is wrong. It seems to assume that the stripe_heads will be evenly distributed over all hash values, which isn't the case. In particular, shrink_stripes() will stop calling drop_one_stripe() as soon as any inactive_list is empty, but it must continue until all inactive lists are empty. I'll add the seqlock and push that out to my for-next branch, and then you can rebase this patch on top of that. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature