On 2017/10/16 下午11:34, Nate Dailey wrote: > Hi Coly, I'm not 100% sure that's going to work. > > What I'm imagining is that close_sync gets the lock, so now > raid1d/handle_read_error/freeze_array is blocked waiting for it. > Hi Nate, Let me think about this code path. If freeze_array() is blocked by mutex in handle_read_error(), it means conf->array_frozen is 0 still. _wait_barrier() only waits if 1) barrier[idx] is >0, 2) array_frozen is 1. So this situation won't block _wait_barrier() or wait_read_bucket() in all buckets. Then the loop to wait barrier for all buckets won't be blocked if freeze_array() waiting on regular I/O path. But if freeze_array() happens on resync I/O code path, deadlock is possible. Fortunately it seems no freeze_array() called on resync I/O code path. So the mutex seems to be safe so far. But adding a more lock always makes people nervous, and makes code complicated.... I don't like this :( > Is it possible that raid1d might need to complete some sync IO from the > retry_list, that close_sync/wait_all_barriers is waiting for? In that > case, there would still be a deadlock. > > Actually, what if close_sync just did this: > > static void close_sync(struct r1conf *conf) > { > int idx; > > for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { > _wait_barrier(conf, idx); > _allow_barrier(conf, idx); And you may remove wait_all_barriers() and allow_all_barriers(), we have less code, yeah! > } > ... > > Is there any reason all the _wait_barriers have to complete before > starting on _allow_barrier? If not, then I think this change would work > (but will think about it more). > No reason that _allow_barrie() should wait for all _wait_barrier() completed here. wait_all_barriers() in close_sync() is to make sure all flying resync I/O finished. So when it is called, no new resync I/O will be generated, it is used to only wait for flying resync I/O to be completed. So you modification works, I feel it is safe. And the above code is much simpler, I like it :-) Thanks. Coly Li > > On 10/16/2017 10:43 AM, Coly Li wrote: >> On 2017/10/16 下午8:58, Nate Dailey wrote: >>> Hi Coly, I'm not sure I understand the change you're proposing. Would it >>> be something like the following? >>> >>> spin_lock_irq(&conf->resync_lock); >>> conf->array_frozen = 1; >>> raid1_log(conf->mddev, "wait freeze"); >>> while (get_unqueued_pending(conf) != extra) { >>> wait_event_lock_irq_cmd_timeout( >>> conf->wait_barrier, >>> get_unqueued_pending(conf) == extra, >>> conf->resync_lock, >>> flush_pending_writes(conf), >>> timeout); >>> } >>> spin_unlock_irq(&conf->resync_lock); >>> >>> On its own, I don't see how this would make any difference. Until >>> array_frozen == 0, wait_all_barriers will continue to be blocked, which >>> in turn will prevent the condition freeze_array is waiting on from ever >>> becoming true. >> Hi Nate, >> >> You are right, this idea does not help too much, we need to find another >> way. >> >>> Or should something else be done inside the new freeze_array loop that >>> would allow wait_all_barriers to make progress? >> It seems wait_all_barriers() is only used in close_sync(), which is to >> make sure all sync requests hit platters before raid1_sync_request() >> returns. >> >> How about setting a critical section in close_sync() and protected by >> another lock. It might be something like this, >> >> static void close_sync(struct r1conf *conf) >> { >> + mutex_lock close_sync_lock; >> wait_all_barriers(conf); >> + mutex_unlock close_sync_lock; >> allow_all_barriers(conf); >> [snip] >> } >> >> >> static void freeze_array(struct r1conf *conf, int extra) >> { >> + mutex_lock close_sync_lock >> spin_lock_irq(&conf->resync_lock); >> [snip] >> spin_unlock_irq(&conf->resync_lock); >> + mutex_unlock close_sync_lock >> } >> >> Then conf->array_frozen won't be set when wait_all_barriers() partially >> iterated on barrier buckets. Then a deadlock can be avoided. >> >> How do you think of this one ? >> >> Thanks. >> >> Coly Li -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html