Hi all, I saw a crash in xfs/141 last night while testing xfs for-next on a V4 filesystem. I think the problem here is that: 1. xfs_attr_set calls xfs_attr_set_args. 2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp. 3. _set_args calls xfs_defer_finish. 4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll. 5. xfs_trans_roll sets @args.trans to the new transaction created by xfs_trans_dup. 6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and fails because we injected a log error. 7. _trans_roll passes the error code up to _defer_finish. 8. _defer_finish passes the error code up to _attr_set without joining anything (in particular leaf_bp which we bhold'd so that we would retain the buffer lock) to the new transaction. Note that @args.trans is still set to the new transaction; nobody's cancelled it. 8. xfs_attr_set tries to bail out through out_release_leaf. The leaf_bp is non-null and args.trans is non-null, so we call xfs_trans_brelse to release leaf_bp from args.trans 9. leaf_bp isn't joined to any transaction at all, but we passed in a non-null args.trans so _trans_brelse hits the "bp->b_transp == tp" assert and then crashes trying to assert on the buffer log item which is null because leaf_bp isn't joined to any transaction. I think the underlying problem here is that we haven't precisely defined what's supposed to happen when the _trans_roll in _defer_trans_roll errors out. Looking at the three callers of trans_bhold (attr setting, allocating disk space for a new dquot, and online repair) the handling of this situation is not consistent. (Embarassingly, the first of those three was RVB'd by me and the latter two were written by me :P) Should we attach the bholded buffers to the new transaction even if the old transaction failed to commit so that callers can xfs_trans_brelse the buffer and it all works out (like what the attr code does)? Or should we cancel the new transaction and set *tpp to NULL, which will also solve this problem for the attr case since args.trans will be null after the old commit fails? But then we run into the problem that now every caller has to be prepared to deal with the transaction being null if something went wrong. All of those callers ought to be calling xfs_trans_cancel if that's the case, so we might be able to get away with adding a if (!tp) return; at the top of that function. Or are the callers of _defer_finish/_trans_roll simply supposed to know that the buffer might not be joined if the roll fails and deal with it appropriately? Extra complication: if the rolling succeeds but one of the defer_ops->finish_item() calls fails then the buffer /will/ be joined to the transaction being passed out. I /think/ the easiest answer here is that we ought to join the buffers to the new transaction even if committing the old one fails. I also suspect the out_release code in xrep_roll_ag_trans is wrong, but that's my problem to fix. :) --D run fstests xfs/141 at 2019-04-23 00:01:13 XFS (sdc): Mounting V4 Filesystem XFS (sdc): Ending clean mount XFS (sdb): Mounting V4 Filesystem XFS (sdb): Ending clean mount XFS (sdb): Quotacheck needed: Please wait. XFS (sdb): Quotacheck: Done. XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb" XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent. XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0 XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385 potentially unexpected fatal signal 6. CPU: 1 PID: 1154887 Comm: fsstress Tainted: G W 5.1.0-rc5-djw #rc5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014 RIP: 0033:0x7fdb91cd9e97 Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97 RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002 RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005 R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560 FS: 00007fdb928cdb80 GS: 0000000000000000 potentially unexpected fatal signal 6. CPU: 1 PID: 1154888 Comm: fsstress Tainted: G W 5.1.0-rc5-djw #rc5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014 RIP: 0033:0x7fdb91cd9e97 Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97 RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002 RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0 R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120 R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60 FS: 00007fdb928cdb80 GS: 0000000000000000 XFS (sdb): Log I/O Error Detected. Shutting down filesystem XFS (sdb): Please unmount the filesystem and rectify the problem(s) WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs] Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug] CPU: 0 PID: 1154885 Comm: fsstress Tainted: G W 5.1.0-rc5-djw #rc5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014 RIP: 0010:assfail+0x27/0x2a [xfs] Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13 RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80 R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000 FS: 00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0 Call Trace: xfs_trans_brelse+0x185/0x210 [xfs] xfs_attr_set+0x2ce/0x300 [xfs] xfs_xattr_set+0x54/0xa0 [xfs] __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x54/0xf0 vfs_setxattr+0x81/0xa0 setxattr+0x146/0x1d0 path_setxattr+0xd0/0xf0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x50/0x160 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fdb91db6839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839 RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0 RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0 R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] (null) hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220 softirqs last enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220 softirqs last disabled at (0): [<0000000000000000>] (null) ---[ end trace f2eeafdb05fd11a1 ]--- BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 1154885 Comm: fsstress Tainted: G W 5.1.0-rc5-djw #rc5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014 RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs] Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13 RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80 R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000 FS: 00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0 Call Trace: xfs_attr_set+0x2ce/0x300 [xfs] xfs_xattr_set+0x54/0xa0 [xfs] __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x54/0xf0 vfs_setxattr+0x81/0xa0 setxattr+0x146/0x1d0 path_setxattr+0xd0/0xf0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x50/0x160 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fdb91db6839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839 RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0 RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0 R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60 Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug] Dumping ftrace buffer: (ftrace buffer empty) CR2: 0000000000000038 ---[ end trace f2eeafdb05fd11a2 ]--- RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs] Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13 RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80 R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000 FS: 00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0