pNFS double-free DS commit info filelayout_free_lseg() [v5.4]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Trond,

We saw a BUG recently, with one of our kernels based on v5.4.249. Full stack appended.

The crash dump shows no other NFS/RPC activity, other than the task that hits the BUG assert, which is handling the rpc_async_release workqueue.

My notes say: we're in pgio release and async write completion. The last thing that does is the pgio header release, which here is pnfs_writehdr_free, which wants to free the layout segment. Since we're using the files layout, we end up in filelayout_free_lseg.

That calls kfree() on the pnfs_commit_bucket in the file layout's DS commit info. Then kfree() eventually falls foul of the double-free detection BUG assert (object==fp) in set_freepointer(), which is inlined in __slab_free().


The short writes that appear to precede the issue are interesting, too, perhaps?


I noticed your patch series for v5.7:

	[PATCH v2 00/22] Fix NFS commit to DS

	https://lore.kernel.org/all/20200328153220.1352010-1-trondmy@xxxxxxxxxx/

and in particular:

	12/22 pNFS: Add infrastructure for cleaning up per-layout commit structures

which adjusts filelayout_free_lseg(), and in particular takes the inode lock around the kfree(), which I thought might help avoid double-frees, if it were a concurrent task issue…

[although a few others in that series look like they might be relevant too?]

I wondered, if indeed it's relevant here, would you consider that series, or part thereof, to be suitable for a backport to upstream longterm v5.4.y?

Or are things not that simple, perhaps?


Unfortunately (or not) the issue has been seen only once, some months ago, so enabling SLUB debugging may not help unless it recurs.


thanks very much,

cheers,
calum.


[ 1093895.472897] NFS: Server wrote zero bytes, expected 102.
[ 1094250.849719] NFS: Server wrote zero bytes, expected 9264.
[ 1094598.709711] NFS: Server wrote zero bytes, expected 121.
[ 1094994.466054] NFS: Server wrote zero bytes, expected 121.
[ 1095450.147843] NFS: Server wrote zero bytes, expected 9089.
[ 1095862.677474] NFS: Server wrote zero bytes, expected 122.
[ 1096307.016082] NFS: Server wrote zero bytes, expected 86.
[ 1096650.380616] NFS: Server wrote zero bytes, expected 9349.
[ 1096956.556415] NFS: Server wrote zero bytes, expected 122.
[ 1097341.926574] NFS: Server wrote zero bytes, expected 75.
[ 1097748.691058] NFS: Server wrote zero bytes, expected 70.
[ 1098049.367847] NFS: Server wrote zero bytes, expected 122.
[ 1098349.546874] NFS: Server wrote zero bytes, expected 65536.
[ 1098652.138746] NFS: Server wrote zero bytes, expected 65536.
[ 1098652.138748] NFS: Server wrote zero bytes, expected 65536.
[ 1098652.138749] NFS: Server wrote zero bytes, expected 65536.
[ 1098954.730458] NFS: Server wrote zero bytes, expected 65536.


[ 1098964.976068] kernel BUG at mm/slub.c:299!


[1098965.011057] Workqueue: nfsiod rpc_async_release [sunrpc]
[1098965.017333] RIP: 0010:__slab_free+0x19d/0x376
[1098965.022644] Code: fa 66 0f 1f 44 00 00 f0 49 0f ba 2c 24 00 0f 82 a4 00 00 00 4d 3b 6c 24 20 74 11 49 0f ba 34 24 00 57 9d 0f 1f 44 00 00 eb 9b <0f> 0b 49 3b 5c 24 28 75 e8 48 8b 44 24 28 49 89 4c 24 28 49 89 44
[1098965.044341] RSP: 0018:ffffb798068ffc50 EFLAGS: 00010246
[1098965.050543] RAX: ffff8cc01596e700 RBX: 0000000080400030 RCX: ffff8cc01596e700 [1098965.058886] RDX: ffff8cc01596e700 RSI: ffffeccb35565b80 RDI: ffff8c53c7c07600 [1098965.067610] RBP: ffffb798068ffd00 R08: 0000000000000001 R09: ffffffffc0a7343a [1098965.076504] R10: ffff8cc01596e700 R11: 0000000000000001 R12: ffffeccb35565b80 [1098965.085132] R13: ffff8cc01596e700 R14: ffff8c53c7c07600 R15: ffff8c53c7c07600 [1098965.093546] FS: 0000000000000000(0000) GS:ffff8dcfff700000(0000) knlGS:0000000000000000
[1098965.103105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1098965.109870] CR2: 000055aaf31dfd5a CR3: 000000e431a0a004 CR4: 00000000007606e0 [1098965.118272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1098965.126644] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1098965.135054] PKRU: 55555554
[1098965.138388] Call Trace:
[1098965.141380]  ? show_regs.cold.12+0x1a/0x1c
[1098965.146265]  ? __die+0x86/0xd2
[1098965.149946]  ? die+0x2f/0x4f
[1098965.153422]  ? do_trap+0xd5/0xeb
[1098965.157298]  ? do_error_trap+0x7c/0xb7
[1098965.161804]  ? __slab_free+0x19d/0x376
[1098965.166324]  ? do_invalid_op+0x3b/0x49
[1098965.170808]  ? __slab_free+0x19d/0x376
[1098965.175281]  ? invalid_op+0x127/0x12c
[1098965.179662]  ? filelayout_free_lseg+0x5a/0x77 [nfs_layout_nfsv41_files]
[1098965.187474]  ? __slab_free+0x19d/0x376
[1098965.191988]  kfree+0x3d4/0x3ed
[1098965.195654]  ? kmem_cache_free+0x3f9/0x412
[1098965.200540]  ? filelayout_free_lseg+0x5a/0x77 [nfs_layout_nfsv41_files]
[1098965.208331]  filelayout_free_lseg+0x5a/0x77 [nfs_layout_nfsv41_files]
[1098965.215903]  pnfs_put_lseg+0xd7/0x192 [nfsv4]
[1098965.221121]  pnfs_writehdr_free+0x16/0x30 [nfsv4]
[1098965.226743]  nfs_write_completion+0x188/0x210 [nfs]
[1098965.232545]  ? __rpc_sleep_on_priority_timeout+0xf0/0xf0 [sunrpc]
[1098965.239768]  ? refcount_dec_and_lock+0x16/0x72
[1098965.245040]  nfs_pgio_release+0x16/0x20 [nfs]
[1098965.250240]  pnfs_generic_rw_release+0x29/0x30 [nfsv4]
[1098965.256317]  rpc_free_task+0x3f/0x69 [sunrpc]
[1098965.267574]  rpc_async_release+0x30/0x50 [sunrpc]
[1098965.278845]  process_one_work+0x1bb/0x3a9
[1098965.290448]  worker_thread+0x37/0x3b2
[1098965.300185]  kthread+0x120/0x136
[1098965.308976]  ? create_worker+0x1b0/0x1ab
[1098965.318468]  ? __kthread_cancel_work+0x50/0x46
[1098965.328425]  ret_from_fork+0x24/0x36




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux