On Wed, Mar 27, 2019 at 06:03:38PM +0200, Alex Lyakas wrote: > Hi Darrick, > > I started this long email thread originally, and posted a patch with > the proposed fix to the "Metadata corruption at > xfs_attr3_leaf_write_verify" problem. We reported this problem > originally. Eventually we found a stable reproducer for the issue, > added different prints in the code, and posted our analysis to > community in https://www.spinics.net/lists/linux-xfs/msg08752.html. > The community (Dave) confirmed that we found a "zero day" bug, and > gave us some hints on how to fix it. Hence this thread. > > After reviewing my patch, Dave expressed the following concern: > > "The problem is that the locked buffer is not joined and logged in > the rolling transactions run in xfs_defer_ops. Hence it can pin the > tail of the AIL, and this can prevent the transaction roll from > regranting the log space necessary to continue rolling the > transaction for the required number of transactions to complete the > deferred ops. If this happens, we end up with a log space deadlock." > > However, after more discussions, there was more or less a consensus > that for kernel 3.18 this fix should be safe. We went ahead, applied > and qualified the fix. With this fix we did not see the issue in any > of the production systems, which were hitting the issue frequently. > > We are now in the process of moving to long-term kernel 4.14.x. We > see, however, that this problem was fixed by the community only for > kernels 4.15 and later. Since we had several production systems > hitting this issue frequently, we need a fix for it in kernel 4.14. > > Hence our question: whether our original patch should be safe to apply > to kernel 4.14? > > Brian, Dave, can you perhaps also comment? The right thing to do is to backport the upstream fix and all it's dependencies to the LTS kernel. If it's 4.15 to 4.14, everything should pretty much just drop in without too much hassle. Then test the backport fixes the problem it was intended to fix, post the patch series to the XFS list as [STABLE PATCH X/Y] with a cc to stable@xxxxxxxxxx, and if it passes review (shouldn't be an issue if it's a straight backport) it will get merged into the 4.14-LTS kernel tree and go through the stable kernel QA process. This gets the problem fixed for all users of the LTS kernel, and you do not have to maintain the backport yourself as you update to new LTS kernels over the life of your product.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx