Re: [PATCH] xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 27, 2019 at 06:03:38PM +0200, Alex Lyakas wrote:
> Hi Darrick,
> 
> I started this long email thread originally, and posted a patch with
> the proposed fix to the "Metadata corruption at
> xfs_attr3_leaf_write_verify" problem. We reported this problem
> originally. Eventually we found a stable reproducer for the issue,
> added different prints in the code, and posted our analysis to
> community in https://www.spinics.net/lists/linux-xfs/msg08752.html.
> The community (Dave) confirmed that we found a "zero day" bug, and
> gave us some hints on how to fix it. Hence this thread.
> 
> After reviewing my patch, Dave expressed the following concern:
> 
> "The problem is that the locked buffer is not joined and logged in
> the rolling transactions run in xfs_defer_ops. Hence it can pin the
> tail of the AIL, and this can prevent the transaction roll from
> regranting the log space necessary to continue rolling the
> transaction for the required number of transactions to complete the
> deferred ops. If this happens, we end up with a log space deadlock."
> 
> However, after more discussions, there was more or less a consensus
> that for kernel 3.18 this fix should be safe. We went ahead, applied
> and qualified the fix. With this fix we did not see the issue in any
> of the production systems, which were hitting the issue frequently.
> 
> We are now in the process of moving to long-term kernel 4.14.x. We
> see, however, that this problem was fixed by the community only for
> kernels 4.15 and later. Since we had several production systems
> hitting this issue frequently, we need a fix for it in kernel 4.14.
> 
> Hence our question: whether our original patch should be safe to apply
> to kernel 4.14?
> 
> Brian, Dave, can you perhaps also comment?

The right thing to do is to backport the upstream fix and all it's
dependencies to the LTS kernel. If it's 4.15 to 4.14, everything
should pretty much just drop in without too much hassle. Then test
the backport fixes the problem it was intended to fix, post the
patch series to the XFS list as [STABLE PATCH X/Y] with a cc to
stable@xxxxxxxxxx, and if it passes review (shouldn't be an issue if
it's a straight backport) it will get merged into the 4.14-LTS kernel
tree and go through the stable kernel QA process.

This gets the problem fixed for all users of the LTS kernel, and you
do not have to maintain the backport yourself as you update to new
LTS kernels over the life of your product....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux