[Bug 216343] XFS: no space left in xlog cause system hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=216343

--- Comment #1 from Dave Chinner (david@xxxxxxxxxxxxx) ---
[cc Amir, the 5.10 stable XFS maintainer]

On Tue, Aug 09, 2022 at 11:46:23AM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216343
> 
>             Bug ID: 216343
>            Summary: XFS: no space left in xlog cause system hang
>            Product: File System
>            Version: 2.5
>     Kernel Version: 5.10.38
>           Hardware: ARM
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx
>           Reporter: zhoukete@xxxxxxx
>         Regression: No
> 
> Created attachment 301539
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit
> stack
> 
> 1. cannot login with ssh, system hanged and cannot do anything
> 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192'
> 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce it.
> 
> Follwing is my analysis from vmcore:
> 
> The reason why tty cannot login is pid 2021571 hold the acct_process mutex,
> and
> 2021571 cannot release mutex because it is wait for xlog release space. See
> the
> stac info in the attachment of stack.txt
> 
> So I try to figure out what happened to xlog
> 
> crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000 
>   ail_target_prev = 0xe9200058600
>   ail_target = 0xe9200058600
>   ail_head = {
>     next = 0xffff0340999a0a80, 
>     prev = 0xffff020013c66b40
>   }
> 
> there are 112 log item in ail list
> crash> list 0xffff0340999a0a80 | wc -l
> 112 
> 
> 79 item of them are xlog_inode_item
> 30 item of them are xlog_buf_item
> 
> crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x 
>   li_flags = 0x1
>   li_lsn = 0xe910005cc00 ===> first item lsn
> 
> crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x
>   li_flags = 0x1
>   li_lsn = 0xe9200058600 ===> last item lsn
> 
> crash>xfs_log_item.li_buf 0xffff0340999a0a80               
>  li_buf = 0xffff0200125b7180
> 
> crash> xfs_buf.b_flags 0xffff0200125b7180 -x
>  b_flags = 0x110032  (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES) 
> 
> crash> xfs_buf.b_state 0xffff0200125b7180 -x
>   b_state = 0x2 (XFS_BSTATE_IN_FLIGHT)
> 
> crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180
> -x
>   b_last_error = 0x0
>   b_retries = 0x0
>   b_first_retry_time = 0x0 
> 
> The buf flags show the io had been done(XBF_DONE is set).
> When I review the code xfs_buf_ioend, if XBF_DONE is set,
> xfs_buf_inode_iodone
> will be called and it will remove the log item from ail list, then release
> the
> xlog space by moving the tail_lsn.
> 
> But now this item is still in the ail list, and the b_last_error = 0,
> XBF_WRITE
> is set.
> 
> xfs buf log item is the same as the inode log item.
> 
> crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80
> ffff033f8d7c9de8
>   li_buf = 0x0
> crash> xfs_buf_log_item.bli_buf  ffff033f8d7c9de8
>   bli_buf = 0xffff0200125b4a80
> crash> xfs_buf.b_flags 0xffff0200125b4a80 -x
>   b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES) 
> 
> I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the
> item
> still in the ail.
> 
> Is my analysis correct? 
> Why xlog space cannot release space?
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux