Re: [PATCH] Force log to disk before reading the AGF during a fstrim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 10, 2018 at 08:18:34AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 10, 2018 at 05:08:14PM +0200, Carlos Maiolino wrote:
> > Forcing the log to disk after reading the agf is wrong, we might be
> > calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
> > 
> > This can cause a deadlock when racing a fstrim with a filesystem
> > shutdown.
> > 
> > The deadlock has been identified due a miscalculation bug in device-mapper
> > dm-thin, which returns lack of space to its users earlier than the device itself
> > really runs out of space, changing the device-mapper volume into an error state.
> > 
> > The problem happened while filling the filesystem with a single file,
> > triggering the bug in device-mapper, consequently causing an IO error
> > and shutting down the filesystem.
> > 
> > If such file is removed, and fstrim executed before the XFS finishes the
> > shut down process, the fstrim process will end up holding the buffer
> > lock, and going to sleep on the cil wait queue.
> > 
> > At this point, the shut down process will try to wake up all the threads
> > waiting on the cil wait queue, but for this, it will try to hold the
> > same buffer log already held my the fstrim, locking up the filesystem.
> > 
> > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
> > ---
> > Kudos to Dave who spotted it just after me telling him the process was
> > deadlocked on a buffer lock
> 
> Uh... got an xfstest to reproduce this? :)
> 

Not yet, I'm actually relying on the DM bug to hit this problem :P

I think it can be fabricated without relying using xfstests though.

One thing I forgot to mention, is I could only reproduce it using a very
low-latency device (in my test cases, using a ramdisk), I couldn't trigger it
using my regular spindles or SSD, I believe the higher latency 'helps' the
buffer lock to be released before the fstrim actually tries to lock it.

I don't remember though of any xfstests using ramdisks, so, is it ok if I try to
write a xfstest using a ramdisk device?

Cheers

> Code looks reasonable,
> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> --D
> 
> 
> >  fs/xfs/xfs_discard.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> > index b2cde5426182..7b68e6c9a474 100644
> > --- a/fs/xfs/xfs_discard.c
> > +++ b/fs/xfs/xfs_discard.c
> > @@ -50,19 +50,19 @@ xfs_trim_extents(
> >  
> >  	pag = xfs_perag_get(mp, agno);
> >  
> > -	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
> > -	if (error || !agbp)
> > -		goto out_put_perag;
> > -
> > -	cur = xfs_allocbt_init_cursor(mp, NULL, agbp, agno, XFS_BTNUM_CNT);
> > -
> >  	/*
> >  	 * Force out the log.  This means any transactions that might have freed
> > -	 * space before we took the AGF buffer lock are now on disk, and the
> > +	 * space before we take the AGF buffer lock are now on disk, and the
> >  	 * volatile disk cache is flushed.
> >  	 */
> >  	xfs_log_force(mp, XFS_LOG_SYNC);
> >  
> > +	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
> > +	if (error || !agbp)
> > +		goto out_put_perag;
> > +
> > +	cur = xfs_allocbt_init_cursor(mp, NULL, agbp, agno, XFS_BTNUM_CNT);
> > +
> >  	/*
> >  	 * Look up the longest btree in the AGF and start with it.
> >  	 */
> > -- 
> > 2.14.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux