On Tue, Apr 10, 2018 at 08:18:34AM -0700, Darrick J. Wong wrote: > On Tue, Apr 10, 2018 at 05:08:14PM +0200, Carlos Maiolino wrote: > > Forcing the log to disk after reading the agf is wrong, we might be > > calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held. > > > > This can cause a deadlock when racing a fstrim with a filesystem > > shutdown. > > > > The deadlock has been identified due a miscalculation bug in device-mapper > > dm-thin, which returns lack of space to its users earlier than the device itself > > really runs out of space, changing the device-mapper volume into an error state. > > > > The problem happened while filling the filesystem with a single file, > > triggering the bug in device-mapper, consequently causing an IO error > > and shutting down the filesystem. > > > > If such file is removed, and fstrim executed before the XFS finishes the > > shut down process, the fstrim process will end up holding the buffer > > lock, and going to sleep on the cil wait queue. > > > > At this point, the shut down process will try to wake up all the threads > > waiting on the cil wait queue, but for this, it will try to hold the > > same buffer log already held my the fstrim, locking up the filesystem. > > > > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx> > > --- > > Kudos to Dave who spotted it just after me telling him the process was > > deadlocked on a buffer lock > > Uh... got an xfstest to reproduce this? :) > Not yet, I'm actually relying on the DM bug to hit this problem :P I think it can be fabricated without relying using xfstests though. One thing I forgot to mention, is I could only reproduce it using a very low-latency device (in my test cases, using a ramdisk), I couldn't trigger it using my regular spindles or SSD, I believe the higher latency 'helps' the buffer lock to be released before the fstrim actually tries to lock it. I don't remember though of any xfstests using ramdisks, so, is it ok if I try to write a xfstest using a ramdisk device? Cheers > Code looks reasonable, > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --D > > > > fs/xfs/xfs_discard.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c > > index b2cde5426182..7b68e6c9a474 100644 > > --- a/fs/xfs/xfs_discard.c > > +++ b/fs/xfs/xfs_discard.c > > @@ -50,19 +50,19 @@ xfs_trim_extents( > > > > pag = xfs_perag_get(mp, agno); > > > > - error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); > > - if (error || !agbp) > > - goto out_put_perag; > > - > > - cur = xfs_allocbt_init_cursor(mp, NULL, agbp, agno, XFS_BTNUM_CNT); > > - > > /* > > * Force out the log. This means any transactions that might have freed > > - * space before we took the AGF buffer lock are now on disk, and the > > + * space before we take the AGF buffer lock are now on disk, and the > > * volatile disk cache is flushed. > > */ > > xfs_log_force(mp, XFS_LOG_SYNC); > > > > + error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); > > + if (error || !agbp) > > + goto out_put_perag; > > + > > + cur = xfs_allocbt_init_cursor(mp, NULL, agbp, agno, XFS_BTNUM_CNT); > > + > > /* > > * Look up the longest btree in the AGF and start with it. > > */ > > -- > > 2.14.3 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html