On Tue, May 14, 2024 at 6:37 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > On Tue, 14 May 2024 10:05:21 +0800 Sam Sun <samsun1006219@xxxxxxxxx> > > On Tue, May 14, 2024 at 6:54 AM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > On Mon, 13 May 2024 20:57:44 +0800 Sam Sun <samsun1006219@xxxxxxxxx> > > > > > > > > I applied this patch and tried using the C repro, but it still crashed > > > > with the same task hang kernel dump log. > > > > > > Oh low-hanging pear is sour, and try again seeing if there is missing > > > wakeup due to wake batch. > > > > > > --- x/lib/sbitmap.c > > > +++ y/lib/sbitmap.c > > > @@ -579,6 +579,8 @@ void sbitmap_queue_wake_up(struct sbitma > > > unsigned int wake_batch = READ_ONCE(sbq->wake_batch); > > > unsigned int wakeups; > > > > > > + __sbitmap_queue_wake_up(sbq, nr); > > > + > > > if (!atomic_read(&sbq->ws_active)) > > > return; > > > > > > -- > > > > I applied this patch together with the last patch. Unfortunately it > > still crashed. > > After two rounds of test, what is clear now so far is -- it is IOs > in flight that caused the task hung reported, though without spotting > why they failed to complete within 120 seconds. > > > > Pointed out by Tetsuo, this kernel panic might be caused by sending > > NMI between cpus. As dump log shows: > > ``` > > [ 429.046960][ T32] NMI backtrace for cpu 0 > > [ 429.047499][ T32] CPU: 0 PID: 32 Comm: khungtaskd Not tainted 6.9.0-dirty #6 > > [ 429.048417][ T32] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 > > [ 429.049873][ T32] Call Trace: > > [ 429.050299][ T32] <TASK> > > [ 429.050672][ T32] dump_stack_lvl+0x201/0x300 > > ... > > [ 429.063133][ T32] ret_from_fork_asm+0x11/0x20 > > [ 429.063735][ T32] </TASK> > > [ 429.064168][ T32] Sending NMI from CPU 0 to CPUs 1: > > [ 429.064833][ T32] BUG: unable to handle page fault for address: > > ffffffff813d4cf1 > > Given many syzbot reports without gpf like this one, I have difficulty > understanding it. If it is printed after task hung detected, it should > be a seperate issue. > I tried to run # echo 0 > /proc/sys/kernel/hung_task_all_cpu_backtrace before running the reproducer, the kernel stops panic. But still, even if I terminate the execution of the reproducer, kernel continues dumping task hung logs. After setting bung_task_all_cpu_backtrace back to 1, it panic immediately during next dump. So I guess it is still a task hung instead of general protection fault. > > [ 429.065765][ T32] #PF: supervisor write access in kernel mode > > [ 429.066502][ T32] #PF: error_code(0x0003) - permissions violation > > [ 429.067274][ T32] PGD db38067 P4D db38067 PUD db39063 PMD 12001a1 > > [ 429.068068][ T32] Oops: 0003 [#1] PREEMPT SMP KASAN NOPTI > > [ 429.068767][ T32] CPU: 0 PID: 32 Comm: khungtaskd Not tainted > > 6.9.0-dirty #6 > > [ 429.069666][ T32] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 > > [ 429.071142][ T32] RIP: 0010:__send_ipi_mask+0x541/0x690