[Bug 216343] XFS: no space left in xlog cause system hang

bugzilla-daemon@xxxxxxxxxx · Wed, 17 Aug 2022 13:15:28 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=216343

--- Comment #6 from Amir Goldstein (amir73il@xxxxxxxxx) ---
On Wed, Aug 17, 2022 at 1:19 PM <bugzilla-daemon@xxxxxxxxxx> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
>
> --- Comment #5 from zhoukete@xxxxxxx ---
> (In reply to Amir Goldstein from comment #4)
>
> >
> > According to xfs_trans_dirty_buf() I think it could mean uptodate and
> > dirty buffer.
> >
>
> when I review the xfs_trans_dirty_buf code, I found that xfs inode item
> b_log_item is null,
>
> crash> xfs_log_item.li_buf,li_ops 0xffff0340999a0a80 -x
>   li_buf = 0xffff0200125b7180
>   li_ops = 0xffff800008faec60 <xfs_inode_item_ops>
> crash> xfs_buf.b_log_item 0xffff0200125b7180
>   b_log_item = 0x0
>
> and only xfs buf log item b_log_item has value
>
> crash> xfs_log_item.li_buf,li_ops ffff033f8d7c9de8 -x
>   li_buf = 0x0
>   li_ops = 0xffff800008fae8d8 <xfs_buf_item_ops>
> crash> xfs_buf_log_item.bli_buf  ffff033f8d7c9de8
>   bli_buf = 0xffff0200125b4a80
> crash> xfs_buf.b_log_item 0xffff0200125b4a80
>   b_log_item = 0xffff033f8d7c9de8
> crash> xfs_buf_log_item.bli_flags 0xffff033f8d7c9de8
>   bli_flags = 2     (XFS_BLI_DIRTY)
> crash> xfs_buf_log_item.bli_item.li_flags  ffff033f8d7c9de8
>   bli_item.li_flags = 1,  (XFS_LI_IN_AIL)
>
> So xfs buf log item XFS_DONE is set because of xfs_trans_dirty_buf(),buf xfs
> inode log item never call xfs_trans_dirty_buf() because of b_log_item == 0x0.
>
> Do  you know the reason why xfs inode log item XFS_DONE is set？
>

#define XBF_DONE        (1u << 5) /* all pages in the buffer uptodate */

Buffer uptodate does not mean that it is not dirty.
I am not sure about the rest of your analysis.

> >
> > Maybe the hardware never returned with a response?
> > Hard to say. Maybe someone else has ideas.
> >
>
> If we can prove that XFS_DONE isn't stand for iodone, I think this issue may
> cause by the hardware error.
>
> I find the err msg in dmesg:
> [ 9824.111366] mpt3sas_cm0: issue target reset: handle = (0x0034)
>
> Maybe it tell us mpt3sas lost the io requests before.
>

Yes, maybe it does.

Anyway, if your hardware had errors, could it be that your
filesystem is shutting down?

If it does, you may be hit by the bug fixed by
84d8949e7707 ("xfs: hold buffer across unpin and potential shutdown
processing")
but I am not sure if all the conditions in this bug match your case.

If you did get hit by this bug, you may consider upgrade to v5.10.135
which has the bug fix.

Thanks,
Amir.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.