Re: [BUG] generic/232 hangs on XFS DAX mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 18, 2017 at 03:46:28PM +0800, Eryu Guan wrote:
> Hi all,
> 
> I hit a generic/232 hang on XFS DAX mount, kernel is 4.10-rc4, 4.9
> kernel also could reproduce, so at least it's not a 4.10 regression. I
> haven't tested other kernel versions yet.
> 
> I reproduced it by running generic/232 in a tight loop, it usually hangs
> within 50 iterations. test dev is 15G brd ramdisk.
> 
> I append detailed host info and test config at the end. If you need more
> info please let me know.
> 

FYI, this doesn't appear to be related to DAX. I could reproduce on a
current kernel using defaults on a ramdisk. I also saw the following
assert before the hang:

[  935.549559] XFS: Assertion failed: !(bp->b_flags & _XBF_DELWRI_Q), file: fs/xfs/xfs_buf.c, line: 1010
[  935.558779] ------------[ cut here ]------------
[  935.563516] WARNING: CPU: 7 PID: 29192 at fs/xfs/xfs_message.c:105 asswarn+0x31/0x40 [xfs]
[  935.571778] Modules linked in: xfs brd ip6t_REJECT nf_reject_ipv6 xt_conntrack ip6t_rpfilter ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_e
[  935.642928] CPU: 7 PID: 29192 Comm: 232 Not tainted 4.10.0-rc4 #3
[  935.649005] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 2.3.0 10/18/2011
[  935.656468] Call Trace:
[  935.658912]  dump_stack+0x63/0x86
[  935.662217]  __warn+0xcb/0xf0
[  935.665210]  ? xfs_buftarg_shrink_scan+0x8d/0xc0 [xfs]
[  935.670335]  warn_slowpath_null+0x1d/0x20
[  935.674365]  asswarn+0x31/0x40 [xfs]
[  935.677962]  xfs_buf_rele+0x399/0x3e0 [xfs]
[  935.682163]  xfs_buftarg_shrink_scan+0x8d/0xc0 [xfs]
[  935.687117]  shrink_slab.part.40+0x1f5/0x420
[  935.691376]  shrink_slab+0x29/0x30
[  935.694768]  drop_slab_node+0x31/0x60
[  935.698419]  drop_slab+0x3f/0x70
[  935.701638]  drop_caches_sysctl_handler+0x71/0xc0
[  935.706329]  proc_sys_call_handler+0xea/0x110
[  935.710675]  proc_sys_write+0x14/0x20
[  935.714327]  __vfs_write+0x37/0x160
[  935.717806]  ? selinux_file_permission+0xd7/0x110
[  935.722499]  ? security_file_permission+0x3b/0xc0
[  935.727192]  vfs_write+0xb5/0x1a0
[  935.730496]  SyS_write+0x55/0xc0
[  935.733717]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[  935.738322] RIP: 0033:0x7f5351f92c20
[  935.741888] RSP: 002b:00007fff469bc9a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  935.749440] RAX: ffffffffffffffda RBX: 00007f535225c5e0 RCX: 00007f5351f92c20
[  935.756559] RDX: 0000000000000002 RSI: 000055e935a2ba90 RDI: 0000000000000001
[  935.763677] RBP: 0000000000000001 R08: 00007f535225d740 R09: 00007f53528aa700
[  935.770794] R10: 0000000000000073 R11: 0000000000000246 R12: 000055e935c14780
[  935.777912] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
[  935.785072] ---[ end trace 43d9a50105018845 ]---

... which suggests we've freed a buffer while on a delwri submit queue
(presumably via an AIL push). Indeed, after the fact tracing shows the
AIL spinning on some flush locked dquots.

The problem didn't reproduce with the drop_caches removed from
_check_quota_usage() in xfstests/common/quota. It's possible for that to
be a false negative, but otherwise I'd suspect we have some kind of a
buffer tracking problem between LRU reclaim (i.e., the above shrinker)
and the AIL which results in the flush lock being acquired and never
released (I/O asociated w/ flush never submitted perhaps?). The
quotactl() call below is probably stuck waiting on the flush lock as a
result.

Brian

> Thanks,
> Eryu
> 
> TEST_DEV=/dev/ram0
> SCRATCH_DEV=/dev/ram1
> TEST_DIR=/mnt/test
> SCRATCH_MNT=/mnt/scratch
> MKFS_OPTIONS="-f -m crc=1 -b size=4k"
> MOUNT_OPTIONS="-o dax"
> 
> [ 4489.611370] sysrq: SysRq : Show Blocked State
> [ 4489.632865]   task                        PC stack   pid father
> [ 4489.661722] xfs_quota       D    0 30771  30468 0x00000080
> [ 4489.689072] Call Trace:
> [ 4489.701239]  __schedule+0x21c/0x6b0
> [ 4489.717525]  ? __free_pages+0x19/0x30
> [ 4489.734712]  schedule+0x36/0x80
> [ 4489.749458]  schedule_timeout+0x21c/0x3a0
> [ 4489.768249]  ? __slab_free+0x9b/0x2c0
> [ 4489.785477]  wait_for_completion+0x11c/0x180
> [ 4489.805688]  ? wake_up_q+0x80/0x80
> [ 4489.821670]  xfs_qm_dqpurge+0x6b/0x1b0 [xfs]
> [ 4489.841764]  ? xfs_qm_need_dqattach+0x80/0x80 [xfs]
> [ 4489.864823]  xfs_qm_dquot_walk.isra.10+0xfb/0x190 [xfs]
> [ 4489.889610]  xfs_qm_dqpurge_all+0x5c/0x80 [xfs]
> [ 4489.910959]  xfs_qm_scall_quotaoff+0x127/0x380 [xfs]
> [ 4489.934407]  xfs_quota_disable+0x3d/0x50 [xfs]
> [ 4489.955286]  SyS_quotactl+0x391/0x850
> [ 4489.972522]  ? __audit_syscall_entry+0xaf/0x100
> [ 4489.993523]  ? syscall_trace_enter+0x1d0/0x2b0
> [ 4490.014486]  ? __audit_syscall_exit+0x209/0x290
> [ 4490.035763]  do_syscall_64+0x67/0x180
> [ 4490.053266]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 4490.074955] RIP: 0033:0x7eff755e2fea
> [ 4490.091854] RSP: 002b:00007ffe012179d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
> [ 4490.127517] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007eff755e2fea
> [ 4490.162364] RDX: 0000000000000000 RSI: 00000000012ee2f0 RDI: 0000000000580200
> [ 4490.196833] RBP: 00000000012ee310 R08: 00007ffe012179ec R09: 0000000000000000
> [ 4490.231139] R10: 00007ffe012179ec R11: 0000000000000206 R12: 00000000012ee2d0
> [ 4490.264475] R13: 00000000012ed0a0 R14: 0000000000000000 R15: 00000000012ed080
> 
> [root@hp-dl360g9-15 xfstests]# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    1
> Core(s) per socket:    8
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 63
> Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
> Stepping:              2
> CPU MHz:               2400.000
> BogoMIPS:              4801.94
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              20480K
> NUMA node0 CPU(s):     0-3,8-11
> NUMA node1 CPU(s):     4-7,12-15
> [root@hp-dl360g9-15 xfstests]# free -m
>               total        used        free      shared  buff/cache   available
> Mem:          64305         761       63382           9         161       63071
> Swap:         16379           0       16379
> [root@hp-dl360g9-15 xfstests]# uname -a
> Linux hp-dl360g9-15 4.10.0-rc4 #18 SMP Wed Jan 18 11:39:05 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux