Re: BUG_ON() hit in sunrpc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 7, 2023, at 5:05 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Fri, 08 Sep 2023, Chuck Lever III wrote:
>> 
>>> On Sep 5, 2023, at 11:01 AM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>>> On Sep 5, 2023, at 10:54 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote:
>>>> 
>>>> Hey,
>>>> 
>>>> I just tried to test some changes which had commit
>>>> 99d99825fc07 ("Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs")
>>>> as base and when I booted with the appended config I saw a splat right at boot:
>>>> 
>>>> [   92.804377][ T5306] kernel BUG at net/sunrpc/svc.c:581!
> 
> The most likely cause for this BUG_ON that I can see is if the
> svc_set_num_threads() call in nfsd_svc() fails.
> 
> Either some listening sockets had previously been created via
> write_ports(), or they have just been created in nfsd_startup_net().
> ->sv_nrthreads is 0 and this is an attempt to increase that.
> However the thread creation fails - presumably ENOMEM.
> So we goto out_shutdown, stepping right over the nfsd_shutdown_net()
> ca..
> Then the svc_put() calls (probably two, as keep_active was probably
> set).  result in the kref reaching zero and svc_destroy() being called
> even though there are still sockets present (because nfsd_shutdown_net()
> was skipped).
> 
> I tried
> - error = svc_set_num_threads(serv, NULL, nrservs);
> + error = -ENOMEM; //svc_set_num_threads(serv, NULL, nrservs);
> if (error)
> goto out_shutdown;
> 
> and I get exactly the same BUG_ON() as you got.

Christian, can you provide the arguments your system uses for
rpc.nfsd? I'm not sure which distribution you're using, so I
can't provide the exact steps. /etc/nfs.conf is one place to
look.

Neil, do we really need a BUG_ON for this assertion? I'm
considering making it a simple pr_warn(). Interested in
opinions.

Past that, obviously input checking is missing here, so the
error flows in nfsd_svc() need improvement.


> NeilBrown
> 
> 
>>>> [   92.811194][ T5306] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>>>> [   92.821472][ T5306] CPU: 6 PID: 5306 Comm: rpc.nfsd Tainted: G
>>>> [   92.828578][ T5306] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
>>>> [   92.836319][ T5306] RIP: 0010:svc_destroy+0x206/0x270
>>>> [   92.852006][ T5306] Code: 72 49 8b bc 24 a0 00 00 00 e8 a6 a3 5e f8 48 8b 3c 24 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f e9 8f a3 5e f8 e8 aa df 1c f8 <0f> 0b e8 a3 df 1c f8 0f 0b 4c 89 ff e8 39 03 79 f8 e9 ae fe ff ff
>>>> [   92.867075][ T5306] RSP: 0018:ffffc9000a347b60 EFLAGS: 00010293
>>>> [   92.872714][ T5306] RAX: 0000000000000000 RBX: ffff88813abf5c68 RCX: 0000000000000000
>>>> [   92.884809][ T5306] RDX: ffff888126c38000 RSI: ffffffff896bcf46 RDI: 0000000000000005
>>>> [   92.894190][ T5306] RBP: 00000000fffffff4 R08: 0000000000000005 R09: 0000000000000000
>>>> [   92.900512][ T5306] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88813abf5c50
>>>> [   92.907935][ T5306] R13: ffff88813abf5c50 R14: ffff88813abf5c00 R15: ffff8881183c8000
>>>> [   92.917264][ T5306] FS:  00007fabf0bba740(0000) GS:ffff8883a9100000(0000) knlGS:0000000000000000
>>>> [   92.924880][ T5306] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   92.930358][ T5306] CR2: 00005568a27d60e8 CR3: 00000001737c3000 CR4: 0000000000750ee0
>>>> [   92.937465][ T5306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [   92.943057][ T5306] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [   92.948673][ T5306] PKRU: 55555554
>>>> [   92.953452][ T5306] Call Trace:
>>>> [   92.958082][ T5306]  <TASK>
>>>> [   92.962546][ T5306]  ? show_regs+0x94/0xa0
>>>> [   92.967221][ T5306]  ? die+0x3b/0xb0
>>>> [   92.971702][ T5306]  ? do_trap+0x231/0x410
>>>> [   92.976275][ T5306]  ? svc_destroy+0x206/0x270
>>>> [   92.980717][ T5306]  ? do_error_trap+0xf9/0x230
>>>> [   92.985287][ T5306]  ? svc_destroy+0x206/0x270
>>>> [   92.989693][ T5306]  ? handle_invalid_op+0x34/0x40
>>>> [   92.994044][ T5306]  ? svc_destroy+0x206/0x270
>>>> [   92.998317][ T5306]  ? exc_invalid_op+0x2d/0x40
>>>> [   93.002503][ T5306]  ? asm_exc_invalid_op+0x1a/0x20
>>>> [   93.006701][ T5306]  ? svc_destroy+0x206/0x270
>>>> [   93.010766][ T5306]  ? svc_destroy+0x206/0x270
>>>> [   93.014727][ T5306]  nfsd_svc+0x6d4/0xac0
>>>> [   93.018510][ T5306]  write_threads+0x296/0x4e0
>>>> [   93.022298][ T5306]  ? write_filehandle+0x760/0x760
>>>> [   93.026072][ T5306]  ? simple_transaction_get+0xf8/0x140
>>>> [   93.029819][ T5306]  ? preempt_count_sub+0x150/0x150
>>>> [   93.033456][ T5306]  ? do_raw_spin_lock+0x133/0x2c0
>>>> [   93.037013][ T5306]  ? _copy_from_user+0x5d/0xf0
>>>> [   93.040385][ T5306]  ? write_filehandle+0x760/0x760
>>>> [   93.043610][ T5306]  nfsctl_transaction_write+0x100/0x180
>>>> [   93.046900][ T5306]  vfs_write+0x2a9/0xe40
>>>> [   93.049930][ T5306]  ? export_features_open+0x60/0x60
>>>> [   93.053124][ T5306]  ? kernel_write+0x6c0/0x6c0
>>>> [   93.056116][ T5306]  ? do_sys_openat2+0xb6/0x1e0
>>>> [   93.059167][ T5306]  ? build_open_flags+0x690/0x690
>>>> [   93.062197][ T5306]  ? __fget_light+0x201/0x270
>>>> [   93.065020][ T5306]  ksys_write+0x134/0x260
>>>> [   93.067775][ T5306]  ? __ia32_sys_read+0xb0/0xb0
>>>> [   93.070501][ T5306]  ? rcu_is_watching+0x12/0xb0
>>>> [   93.073073][ T5306]  ? trace_irq_enable.constprop.0+0xd0/0x100
>>>> [   93.075937][ T5306]  do_syscall_64+0x38/0xb0
>>>> [   93.078394][ T5306]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>> 
>>>> I haven't spent time debugging this further. Maybe you see the issue right
>>>> away.
>>> 
>>> I don't, unfortunately. A bisect would be appropriate.
>>> 
>>> I will pull today's master branch and see if I can reproduce.
>> 
>> I wasn't able to reproduce this with yesterday's master. I don't
>> recall anything in Anna's NFS client PR that might account for
>> this crash.
>> 
>> Neil, I think you were the last person to touch the code in and
>> around svc_destroy(). Can you have a look at this?
>> 
>> 
>>>> This problem is only happening after the nfs merges afaict. I'm
>>>> currently using commit 3ef96fcfd50b ("Merge tag 'ext4_for_linus-6.6-rc1'
>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4") as base
>>>> and that splat doesn't appear.
>>>> 
>>>> Hopefully this is not a red herring.
>>>> Christian
>>>> <.config.txt>
>>> 
>>> --
>>> Chuck Lever
>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> 

--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux