On Thu, 18 Nov 2010 18:04:25 +0000 Tim Small <tim@xxxxxxxxxxx> wrote: > On 10/21/10 00:04, Neil Brown wrote: > > > > Maybe you could add a could of global atomic variables, one for reads and one > > for writes. > > Then on each call to generic_make_request in: > > flush_pending_writes, make_request, raid1d > > increment one or the other depending on whether it is a read or a write. > > Then in raid1_end_read_request and raid1_end_write_request decrement them > > appropriately. > > > > > Ended up with runs like this: > > [ 464.244109] 0 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.244109] conf nr_pending is 5 > [ 464.244109] 0 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.244109] 2216: Post wait until no IO waiting - barrier: 0, pend: > 5, wait: 0, queued: 0 > [ 464.244113] 2216: Pre wait pending to complete - barrier: 1, pend: 5, > wait: 0, queued: 0 > [ 464.244116] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 464.244469] 5113 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.244469] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 464.244469] 5127 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.244469] conf nr_pending is 4 > [ 464.244469] 5127 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.244469] 5127 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.246176] In raid1_unplug debug read count: 4 write count: 0 > conf->nr_queued: 0 > [ 464.244469] conf nr_pending is 3 > [ 464.244469] 5127 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.244469] 5127 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.244469] conf nr_pending is 2 > [ 464.244469] 5127 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.246176] In raid1_unplug debug read count: 2 write count: 0 > conf->nr_queued: 0 > [ 464.244469] 5127 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.244469] conf nr_pending is 1 > [ 464.244469] 5127 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.246176] In raid1_unplug debug read count: 1 write count: 0 > conf->nr_queued: 0 > [ 464.244469] 5127 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.244469] conf nr_pending is 0 > [ 464.244469] 5127 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.244469] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 0 > [ 464.246176] 2216: Post wait pending to complete - barrier: 1, pend: > 0, wait: 1, queued: 0 > [ 464.246176] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 0 > [ 464.246176] 2216: Pre wait until no IO waiting - barrier: 1, pend: 0, > wait: 1, queued: 0 > [ 464.246176] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 0 > [ 464.246176] 5118 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246176] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 0 > [ 464.246633] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 1 > [ 464.244469] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 0 > [ 464.246176] In raid1_unplug debug read count: 0 write count: 0 > conf->nr_queued: 0 > [ 464.246990] 5118 Released resync_lock and enabling interrupts for > wait_barrier > [ 464.244469] 5113 Released resync_lock and enabling interrupts for > wait_barrier > [ 464.246990] 5118 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246990] 5118 Released resync_lock and enabling interrupts for > wait_barrier > [ 464.246990] 5118 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246990] 5118 Released resync_lock and enabling interrupts for > wait_barrier > [ 464.244469] In raid1_unplug debug read count: 3 write count: 0 > conf->nr_queued: 0 > [ 464.246990] 5118 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246990] 5118 Released resync_lock and enabling interrupts for > wait_barrier > [ 464.246990] 5118 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246990] 5118 Released resync_lock and enabling interrupts for > wait_barrier > [ 464.246176] 2216: Post wait until no IO waiting - barrier: 0, pend: > 6, wait: 0, queued: 0 > [ 464.246176] 2216: Pre wait pending to complete - barrier: 1, pend: 6, > wait: 0, queued: 0 > [ 464.246176] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 464.246990] 5118 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246990] In raid1_unplug debug read count: 6 write count: 0 > conf->nr_queued: 0 > [ 464.247091] 0 Obtaining resync_lock and disabling interrupts for > allow_barrier > [ 464.247091] conf nr_pending is 5 > [ 464.247091] 0 Released resync_lock and enabling interrupts for > allow_barrier > [ 464.247091] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 464.244469] 5113 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.246176] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 464.244469] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 464.328828] 3639 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 464.328834] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 466.065137] 2479 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 466.065137] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 467.051970] 2478 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 467.051975] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 471.293667] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 471.525081] 220 Obtaining resync_lock and disabling interrupts for > wait_barrier > [ 471.525081] In raid1_unplug debug read count: 5 write count: 0 > conf->nr_queued: 0 > [ 505.308962] 2216 WARNING, schedule_timeout timed out for raise_barrier > [ 505.308076] 5118 WARNING, schedule_timeout timed out for wait_barrier > > > ... so does that mean that there are are read requests going awol in the > block layer? > > If so, then the circumstantial evidence from when the lockups occur make > it look to me like this is probably an OpenVZ bug, but I'll try and do > some more digging tomorrow... > > Does that make sense? Yes. Superficially, it appears that there are still 5 outstanding read requests that are not being completed. I cannot guess how OpenVZ would cause that, but I don't really know much about OpenVZ. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html