Re: [PATCH v7 0/4] xfs: Remove wrappers for some semaphores

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Mon, 23 Mar 2020 09:33:42 -0700

On Mon, Mar 23, 2020 at 10:22:02AM +0100, Pavel Reichl wrote:
> Oh, thanks for the heads up...I'll try to investigate.

Ahah, I figured it out.  It took me a while to pin down a solid reproducer,
but here's a stack trace that I see most often:

[  812.102819] XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 91
[  812.104017] ------------[ cut here ]------------
[  812.104598] WARNING: CPU: 2 PID: 26250 at fs/xfs/xfs_message.c:112 assfail+0x30/0x50 [xfs]
[  812.105505] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables iptable_filter bfq sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs]
[  812.108176] CPU: 2 PID: 26250 Comm: kworker/2:1 Tainted: G        W         5.6.0-rc4-djw #rc4
[  812.110742] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
[  812.111724] Workqueue: xfsalloc xfs_btree_split_worker [xfs]
[  812.112404] RIP: 0010:assfail+0x30/0x50 [xfs]
[  812.112905] Code: 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 10 5d 37 a0 e8 c5 f8 ff ff 0f b6 1d fa 5b 11 00 80 fb 01 0f 87 aa 1b 06 00 83 e3 01 75 04 <0f> 0b 5b c3 0f 0b 48 c7 c7 80 a6 40 a0 e8 4d b8 0d e1 0f 1f 40 00
[  812.114912] RSP: 0018:ffffc90000c7fc48 EFLAGS: 00010246
[  812.115508] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  812.116286] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0364b4d
[  812.117087] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  812.117875] R10: 000000000000000a R11: f000000000000000 R12: ffff888078695700
[  812.118661] R13: 0000000000000000 R14: ffff8880787c4460 R15: 0000000000000000
[  812.119445] FS:  0000000000000000(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
[  812.120311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.120969] CR2: 00007fe67192deb8 CR3: 00000000713a1002 CR4: 00000000001606a0
[  812.121755] Call Trace:
[  812.122097]  xfs_trans_log_inode+0x128/0x140 [xfs]
[  812.122685]  xfs_bmbt_alloc_block+0x133/0x230 [xfs]
[  812.123286]  __xfs_btree_split+0x16d/0x980 [xfs]
[  812.123876]  xfs_btree_split_worker+0x61/0x90 [xfs]
[  812.124446]  process_one_work+0x250/0x5c0
[  812.124910]  ? worker_thread+0xcf/0x3a0
[  812.125380]  worker_thread+0x3d/0x3a0
[  812.125807]  ? process_one_work+0x5c0/0x5c0
[  812.126297]  kthread+0x121/0x140
[  812.126684]  ? kthread_park+0x80/0x80
[  812.127111]  ret_from_fork+0x3a/0x50
[  812.127545] irq event stamp: 75298
[  812.127947] hardirqs last  enabled at (75297): [<ffffffff810d7f98>] console_unlock+0x428/0x580
[  812.128895] hardirqs last disabled at (75298): [<ffffffff81001db0>] trace_hardirqs_off_thunk+0x1a/0x1c
[  812.129942] softirqs last  enabled at (74892): [<ffffffffa02c9f2b>] xfs_buf_find+0xa6b/0x1130 [xfs]
[  812.130984] softirqs last disabled at (74890): [<ffffffffa02c98c0>] xfs_buf_find+0x400/0x1130 [xfs]
[  812.131993] ---[ end trace 82fd2c9f1faba927 ]---

This one I could reproduce on my laptop by running generic/324 with a 1k
blocksize, but on my test vm fleet it was xfs/057 with 4k blocks.  YMMV.

Anyway -- I augmented isilocked with some extra trace_printk to see what
was going on, and noticed this:

161011:           <...>-28177 [000]   803.593121: xfs_ilock:            dev 8:80 ino 0x46 flags ILOCK_EXCL caller xfs_alloc_file_space
161022:           <...>-28177 [000]   803.593126: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161038:           <...>-28177 [000]   803.593132: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161048:           <...>-28177 [000]   803.593136: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161064:           <...>-28177 [000]   803.593172: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
161083:           <...>-27081 [000]   803.593559: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 0
<assertion blows up>
162017:           <...>-28177 [001]   803.641200: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162036:           <...>-28177 [001]   803.641215: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162057:           <...>-28177 [001]   803.641224: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162073:           <...>-28177 [001]   803.641243: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162091:           <...>-28177 [001]   803.641260: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162106:           <...>-28177 [001]   803.641288: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162127:           <...>-28177 [001]   803.641318: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162140:           <...>-28177 [001]   803.641397: bprint:               xfs_isilocked: ino 0x46 lockf 0x4:0x1 debug_locks 1 arg 0 locked? 1
162158:           <...>-28177 [001]   803.641416: xfs_iunlock:          dev 8:80 ino 0x46 flags ILOCK_EXCL caller xfs_alloc_file_space

Process 28177 takes the ilock, and tries to do a bmapi_write which
involves a bmap btree split.  The kernel libxfs delegates the split to a
workqueue (apparently to reduce stack usage?), which performs the bmbt
split and logs the inode core.  The kworker is process 27081.

lockdep tracks the rwsem's lock state /and/ which process actually
holds the rwsem.  This ownership doesn't transfer from 28177 to 27081,
so when the kworker asks lockdep if it holds ILOCK_EXCL, lockdep says
no, because 27081 doesn't own the lock, 28177 does.  Kaboom.

The old mrlock_t had that 'int mr_writer' field which didn't care about
lock ownership and so isilocked would return true and so the assert was
happy.

So now comes the fun part -- the old isilocked code had a glaring hole
in which it would return true if *anyone* held the lock, even if the
owner is some other unrelated thread.  That's probably good enough for
most of the fstests because we generally only run one thread at a time,
and developers will probably notice. :)

However, with your series applied, the isilocked function becomes much
more powerful when lockdep is active because now we can test that the
lock is held *by the caller*, which closes that hole.

Unfortunately, it also trips over this bmap split case, so if there's a
way to solve that problem we'd better find it quickly.  Unfortunately, I
don't know of a way to gift a lock to another thread temporarily...

Thoughts?

--D

> On Mon, Mar 23, 2020 at 4:28 AM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> >
> > On Fri, Mar 20, 2020 at 10:03:13PM +0100, Pavel Reichl wrote:
> > > Remove some wrappers that we have in XFS around the read-write semaphore
> > > locks.
> > >
> > > The goal of this cleanup is to remove mrlock_t structure and its mr*()
> > > wrapper functions and replace it with native rw_semaphore type and its
> > > native calls.
> >
> > Hmmm, there's something funny about this patchset that causes my fstests
> > vm to explode with isilocked assertions everywhere... I'll look more
> > tomorrow (it's still the weekend here) but figured I should tell you
> > sooner than later.
> >
> > --D
> >
> > > Pavel Reichl (4):
> > >   xfs: Refactor xfs_isilocked()
> > >   xfs: clean up whitespace in xfs_isilocked() calls
> > >   xfs: xfs_isilocked() can only check a single lock type
> > >   xfs: replace mrlock_t with rw_semaphores
> > >
> > >  fs/xfs/libxfs/xfs_bmap.c |   8 +--
> > >  fs/xfs/mrlock.h          |  78 -----------------------------
> > >  fs/xfs/xfs_file.c        |   3 +-
> > >  fs/xfs/xfs_inode.c       | 104 ++++++++++++++++++++++++++-------------
> > >  fs/xfs/xfs_inode.h       |  25 ++++++----
> > >  fs/xfs/xfs_iops.c        |   4 +-
> > >  fs/xfs/xfs_linux.h       |   2 +-
> > >  fs/xfs/xfs_qm.c          |   2 +-
> > >  fs/xfs/xfs_super.c       |   6 +--
> > >  9 files changed, 98 insertions(+), 134 deletions(-)
> > >  delete mode 100644 fs/xfs/mrlock.h
> > >
> > > --
> > > 2.25.1
> > >
> >
>