Re: RAID1: deadlock between freeze_array and blk plug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Joe,

I think the commit you mention is related to handling read errors, in
which case freeze_array is called, and it may hang due to incorrect
accounting of IO requests. Also, this commit is only relevant since
kernel 4.3. For example, in kernel 3.18 there is no "bio_end_io_list"
at all.

Looking more at this issue, I don't think this is related to the new
freeze_array code using array_frozen since
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/md/raid1.c?id=b364e3d048e49b1d177eb7ee7853e77aa0560464

Because the same plugging infrastructure already existed, for example,
in kernel 3.8, but we did not observe similar deadlocks. I will have
to dig more to understand how this deadlock is avoided.

I am more worried now about the freeze_array deadlock I reported in
http://www.spinics.net/lists/raid/msg52678.html

This is a real deadlock that we see now.

Thanks,
Alex.



On Thu, Jun 16, 2016 at 6:38 AM, Lawrence, Joe <Joe.Lawrence@xxxxxxxxxxx> wrote:
> Hi Alexander,
>
> Any chance this was handled by commit "raid1: include bio_end_io_list in
> nr_queued to prevent freeze_array hang" [1]
>
> [1]
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/md/raid1.c?id=ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb
> ________________________________
> From: linux-raid-owner@xxxxxxxxxxxxxxx <linux-raid-owner@xxxxxxxxxxxxxxx> on
> behalf of Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> Sent: Monday, June 13, 2016 7:02:38 AM
> To: Neil Brown; Jes Sorensen; linux-raid
> Subject: RAID1: deadlock between freeze_array and blk plug?
>
> Hello Neil, Jes,
>
> I wonder if the following deadlock is possible:
>
> - Caller calls blk_start_plug and wants to submit two WRITE bios
>
> - First bio successfully calls wait_barrier() and is appended to
> plug->pending list
>
> - Now somebody does freeze_array()
>
> - freeze_array() unconditionally sets:
>    conf->array_frozen = 1;
>    and starts waiting for conf->nr_pending to go down
>
> - Second WRITE bio calls wait_barrier, but it will wait for
> "!conf->array_frozen" until it can proceed
>
> - Now we have a deadlock: first bio will not be submitted because it
> sits on the plug list of the caller, and caller is stuck in
> wait_barrier, so it cannot do blk_finish_plug.
>
> I am about to try to reproduce it on kernel 3.18, but looking at the
> latest Linus tree, I don't see something preventing this from
> happening either. Am I missing something?
>
> Thanks,
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux