On Friday, August 30, 2019 7:43 AM Dave Chinner wrote: > On Fri, Aug 30, 2019 at 10:34:41AM +1000, Dave Chinner wrote: > > On Fri, Aug 30, 2019 at 09:08:17AM +1000, Dave Chinner wrote: > > > On Thu, Aug 29, 2019 at 10:51:59AM +0530, Chandan Rajendra wrote: > > > > 786576: kworker/4:1H-kb 1825 [004] 217.041079: xfs:xfs_log_assign_tail_lsn: dev 7:1 new tail lsn 2/19333, old lsn 2/19330, last sync 3/18501 > > > > > > 200ms later the tail has moved, and last_sync_lsn is now 3/18501. > > > i.e. the iclog writes have made it to disk, and the items have been > > > moved into the AIL. I don't know where that came from, but I'm > > > assuming it's an IO completion based on it being run from a > > > kworker context that doesn't have an "xfs-" name prefix(*). > > > > > > As the tail has moved, this should have woken the anything sleeping > > > on the log tail in xlog_grant_head_wait() via a call to > > > xfs_log_space_wake(). The first waiter should wake, see that there > > > still isn't room in the log (only 3 sectors were freed in the log, > > > we need at least 60). That woken process should then run > > > xlog_grant_push_ail() again and go back to sleep. > > > > Actually, it doesn't get woken because xlog_grant_head_wake() checks > > how much space is available before waking waiters, and there clearly > > isn't enough here. So that's one likely vector. Can you try this > > patch? > > And this one on top to address the situation the previous patch > doesn't.... > Dave, with the 3 patches added (i.e. synchronous transactions during log recovery and the two patches posted now), the deadlock is not recreated. Tested-by: Chandan Rajendra <chandanrlinux@xxxxxxxxx> -- chandan