Re: Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 18 Nov 2010 18:04:25 +0000
Tim Small <tim@xxxxxxxxxxx> wrote:

> On 10/21/10 00:04, Neil Brown wrote:
> >
> > Maybe you could add a could of global atomic variables, one for reads and one
> > for writes.
> > Then on each call to generic_make_request in:
> >   flush_pending_writes, make_request, raid1d
> > increment one or the other depending on whether it is a read or a write.
> > Then in raid1_end_read_request and raid1_end_write_request decrement them
> > appropriately.
> >   
> 
> 
> Ended up with runs like this:
> 
> [  464.244109] 0 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244109] conf nr_pending is 5
> [  464.244109] 0 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244109] 2216: Post wait until no IO waiting - barrier: 0, pend:
> 5, wait: 0, queued: 0
> [  464.244113] 2216: Pre wait pending to complete - barrier: 1, pend: 5,
> wait: 0, queued: 0
> [  464.244116] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5113 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.244469] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 4
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.246176] In raid1_unplug debug read count: 4 write count: 0
> conf->nr_queued: 0
> [  464.244469] conf nr_pending is 3
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 2
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.246176] In raid1_unplug debug read count: 2 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 1
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.246176] In raid1_unplug debug read count: 1 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 0
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244469] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] 2216: Post wait pending to complete - barrier: 1, pend:
> 0, wait: 1, queued: 0
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] 2216: Pre wait until no IO waiting - barrier: 1, pend: 0,
> wait: 1, queued: 0
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246633] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 1
> [  464.244469] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.244469] 5113 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.244469] In raid1_unplug debug read count: 3 write count: 0
> conf->nr_queued: 0
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246176] 2216: Post wait until no IO waiting - barrier: 0, pend:
> 6, wait: 0, queued: 0
> [  464.246176] 2216: Pre wait pending to complete - barrier: 1, pend: 6,
> wait: 0, queued: 0
> [  464.246176] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] In raid1_unplug debug read count: 6 write count: 0
> conf->nr_queued: 0
> [  464.247091] 0 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.247091] conf nr_pending is 5
> [  464.247091] 0 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.247091] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5113 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246176] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.328828] 3639 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.328834] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  466.065137] 2479 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  466.065137] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  467.051970] 2478 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  467.051975] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  471.293667] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  471.525081] 220 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  471.525081] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  505.308962] 2216 WARNING, schedule_timeout timed out for raise_barrier
> [  505.308076] 5118 WARNING, schedule_timeout timed out for wait_barrier
> 
> 
> ... so does that mean that there are are read requests going awol in the
> block layer?
> 
> If so, then the circumstantial evidence from when the lockups occur make
> it look to me like this is probably an OpenVZ bug, but I'll try and do
> some more digging tomorrow...
> 
> Does that make sense?

Yes.  Superficially, it appears that there are still 5 outstanding read
requests that are not being completed.   I cannot guess how OpenVZ would
cause that, but I don't really know much about OpenVZ.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux