https://bugzilla.kernel.org/show_bug.cgi?id=216343 Bug ID: 216343 Summary: XFS: no space left in xlog cause system hang Product: File System Version: 2.5 Kernel Version: 5.10.38 Hardware: ARM OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: XFS Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx Reporter: zhoukete@xxxxxxx Regression: No Created attachment 301539 --> https://bugzilla.kernel.org/attachment.cgi?id=301539&action=edit stack 1. cannot login with ssh, system hanged and cannot do anything 2. dmesg report 'audit: audit_backlog=41349 > audit_backlog_limit=8192' 3. I send sysrq-crash and get vmcore file , I dont know how to reproduce it. Follwing is my analysis from vmcore: The reason why tty cannot login is pid 2021571 hold the acct_process mutex, and 2021571 cannot release mutex because it is wait for xlog release space. See the stac info in the attachment of stack.txt So I try to figure out what happened to xlog crash> struct xfs_ail.ail_target_prev,ail_targe,ail_head 0xffff00ff884f1000 ail_target_prev = 0xe9200058600 ail_target = 0xe9200058600 ail_head = { next = 0xffff0340999a0a80, prev = 0xffff020013c66b40 } there are 112 log item in ail list crash> list 0xffff0340999a0a80 | wc -l 112 79 item of them are xlog_inode_item 30 item of them are xlog_buf_item crash> xfs_log_item.li_flags,li_lsn 0xffff0340999a0a80 -x li_flags = 0x1 li_lsn = 0xe910005cc00 ===> first item lsn crash> xfs_log_item.li_flags,li_lsn ffff020013c66b40 -x li_flags = 0x1 li_lsn = 0xe9200058600 ===> last item lsn crash>xfs_log_item.li_buf 0xffff0340999a0a80 li_buf = 0xffff0200125b7180 crash> xfs_buf.b_flags 0xffff0200125b7180 -x b_flags = 0x110032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_INODES|_XBF_PAGES) crash> xfs_buf.b_state 0xffff0200125b7180 -x b_state = 0x2 (XFS_BSTATE_IN_FLIGHT) crash> xfs_buf.b_last_error,b_retries,b_first_retry_time 0xffff0200125b7180 -x b_last_error = 0x0 b_retries = 0x0 b_first_retry_time = 0x0 The buf flags show the io had been done(XBF_DONE is set). When I review the code xfs_buf_ioend, if XBF_DONE is set, xfs_buf_inode_iodone will be called and it will remove the log item from ail list, then release the xlog space by moving the tail_lsn. But now this item is still in the ail list, and the b_last_error = 0, XBF_WRITE is set. xfs buf log item is the same as the inode log item. crash> list -s xfs_log_item.li_buf 0xffff0340999a0a80 ffff033f8d7c9de8 li_buf = 0x0 crash> xfs_buf_log_item.bli_buf ffff033f8d7c9de8 bli_buf = 0xffff0200125b4a80 crash> xfs_buf.b_flags 0xffff0200125b4a80 -x b_flags = 0x100032 (XBF_WRITE|XBF_ASYNC|XBF_DONE|_XBF_PAGES) I think it is impossible that (XBF_DONE is set & b_last_error = 0) and the item still in the ail. Is my analysis correct? Why xlog space cannot release space? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.