Re: raid1: freeze_array/wait_all_barriers deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Coly, I'm not 100% sure that's going to work.

What I'm imagining is that close_sync gets the lock, so now raid1d/handle_read_error/freeze_array is blocked waiting for it.

Is it possible that raid1d might need to complete some sync IO from the retry_list, that close_sync/wait_all_barriers is waiting for? In that case, there would still be a deadlock.

Actually, what if close_sync just did this:

static void close_sync(struct r1conf *conf)
{
        int idx;

        for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
                _wait_barrier(conf, idx);
                _allow_barrier(conf, idx);
        }
...

Is there any reason all the _wait_barriers have to complete before starting on _allow_barrier? If not, then I think this change would work (but will think about it more).

Nate



On 10/16/2017 10:43 AM, Coly Li wrote:
On 2017/10/16 下午8:58, Nate Dailey wrote:
Hi Coly, I'm not sure I understand the change you're proposing. Would it
be something like the following?

         spin_lock_irq(&conf->resync_lock);
         conf->array_frozen = 1;
         raid1_log(conf->mddev, "wait freeze");
         while (get_unqueued_pending(conf) != extra) {
             wait_event_lock_irq_cmd_timeout(
                 conf->wait_barrier,
                 get_unqueued_pending(conf) == extra,
                 conf->resync_lock,
                 flush_pending_writes(conf),
                 timeout);
         }
         spin_unlock_irq(&conf->resync_lock);

On its own, I don't see how this would make any difference. Until
array_frozen == 0, wait_all_barriers will continue to be blocked, which
in turn will prevent the condition freeze_array is waiting on from ever
becoming true.
Hi Nate,

You are right, this idea does not help too much, we need to find another
way.

Or should something else be done inside the new freeze_array loop that
would allow wait_all_barriers to make progress?
It seems wait_all_barriers() is only used in close_sync(), which is to
make sure all sync requests hit platters before raid1_sync_request()
returns.

How about setting a critical section in close_sync() and protected by
another lock. It might be something like this,

static void close_sync(struct r1conf *conf)
{
+	mutex_lock close_sync_lock;
         wait_all_barriers(conf);
+	mutex_unlock close_sync_lock;
         allow_all_barriers(conf);
	[snip]
}


static void freeze_array(struct r1conf *conf, int extra)
{
+	mutex_lock close_sync_lock
         spin_lock_irq(&conf->resync_lock);
	[snip]
         spin_unlock_irq(&conf->resync_lock);
+	mutex_unlock close_sync_lock
}

Then conf->array_frozen won't be set when wait_all_barriers() partially
iterated on barrier buckets. Then a deadlock can be avoided.

How do you think of this one ?

Thanks.

Coly Li


[snip]


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux