(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). Thanks. On Sat, 21 Dec 2019 03:08:17 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=205937 > > Bug ID: 205937 > Summary: BUG: unable to handle page fault for address: f3170000 > Product: Memory Management > Version: 2.5 > Kernel Version: 5.5-rc2 > Hardware: i386 > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: akpm@xxxxxxxxxxxxxxxxxxxx > Reporter: dclarke@xxxxxxxxxxxxx > Regression: No "yes" Looks like the asynchronous sysfs file removal code is failing. sysfs_slab_remove_workfn(). Guys, did we make recent changes in this area? > Created attachment 286393 > --> https://bugzilla.kernel.org/attachment.cgi?id=286393&action=edit > kernel config for 5.5.0-rc2 > > Testing a system under excessive memory pressure with some trivial > code wherein a set of 16 pthreads are dispatched and each merely fills > an array : > > > void *big_array_fill(void *recv_parm) > { > thread_parm_t *p = (thread_parm_t *)recv_parm; > > printf("TRD : %d filling the big_array.\n", p->tnum); > for ( p->loop0 = 0; p->loop0 < BIG_ARRAY_DIM0; p->loop0++ ) { > for ( p->loop1 = 0; p->loop1 < BIG_ARRAY_DIM1; p->loop1++ ) { > p->big_array[p->loop0][p->loop1] = (uint64_t)(p->loop0 * p->loop1); > } > } > printf("TRD : %d big_array full.\n", p->tnum); > > /* return some random data */ > p->ret_val = drand48(); > > return (NULL); > } > > The received parameters for each thread were in a struct thus : > > titan$ cat p0.h > > #define NUM_THREADS 16 > #define BIG_ARRAY_DIM0 384 > #define BIG_ARRAY_DIM1 65536 > > /* > * struct to pass parameters to a dispatched thread > */ > typedef struct { > uint32_t tnum; /* thread number */ > int sleep_time, loop0, loop1; > double ret_val; /* some sort of a return data value */ > uint64_t big_array[BIG_ARRAY_DIM0][BIG_ARRAY_DIM1]; /* memory abuse */ > } thread_parm_t; > > These threads were fired of as a test while doing a teaching demo : > > printf("\n-------------- begin dispatch -----------------------\n"); > for ( i = 0; i < NUM_THREADS; i++) { > parm[i] = calloc( (size_t) 1 , (size_t) sizeof(thread_parm_t) ); > > if ( parm[i] == NULL ) { > if ( errno == ENOMEM ) { > fprintf(stderr,"FAIL : calloc returns ENOMEM at %s:%d\n", > __FILE__, __LINE__ ); > } else { > fprintf(stderr,"FAIL : calloc fails at %s:%d\n", > __FILE__, __LINE__ ); > } > perror("FAIL "); > /* gee .. before we bail out did we allocate any of the > * previous thread parameter memory regions? If so then > * clean up before bailing out. In fact we may have > * already dispatched out threads. */ > > if (i == 0 ) return ( EXIT_FAILURE ); > > for ( j = 0; j < i; j++ ) { > /* lets ask those threads to just be nice and > * we call them in with a join */ > pthread_join(tid[j], NULL); > fprintf(stderr,"BAIL : pthread_join(%i) done.\n", j); > free(parm[j]); > parm[j] = NULL; > } > fprintf(stderr,"BAIL : cleanup done.\n", j); > ru(); > > return ( EXIT_FAILURE ); > > } > > parm[i]->tnum = i; > parm[i]->sleep_time = 1 + (int)( drand48() * 10.0 ); > > pthread_create( &tid[i], NULL, big_array_fill, (void *)parm[i] ); > > printf("INFO : pthread_create %2i called for %2i secs.\n", > i, parm[i]->sleep_time ); > } > printf("\n-------------- end dispatch -------------------------\n"); > > > All very nice and does what it does on most systems and even with a very > old and slow pentium II with very little memory we see everything just > works fine so long as there is some swap. > > However on linux 5.5-rc2 I see this a warning that the CPU is busy and > that is fine however the process seems to merely get "stuck" for lack > of a better word. A kill -HUP on the pid has no effect. A kill -9 also > seems to have no effect. A kill -9 of the PPID merelu shifts the new > parent to be number 1 and I see a zombie that won't go away. > > esther# > esther# ps -efl | grep -E "UID|dclarke|init" > F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD > 4 S root 1 0 0 80 0 - 9079 do_epo Dec20 ? 00:01:02 > /sbin/init verbose > 4 S dclarke 382 1 0 80 0 - 4320 do_epo Dec20 ? 00:00:03 > /lib/systemd/systemd --user > 5 S dclarke 384 382 0 80 0 - 9424 do_sig Dec20 ? 00:00:00 > (sd-pam) > 0 Z dclarke 914 1 3 95 15 - 0 - 01:13 ? 00:03:13 [p0] > <defunct> > 4 S root 959 338 0 80 0 - 3256 poll_s 01:55 ? 00:00:01 > sshd: dclarke [priv] > 5 S dclarke 965 959 0 80 0 - 3256 poll_s 01:55 ? 00:00:03 > sshd: dclarke@pts/2 > 0 S dclarke 966 965 0 80 0 - 2458 do_wai 01:55 pts/2 00:00:02 > -bash > 0 S root 1188 1107 6 80 0 - 1958 pipe_r 02:57 pts/2 00:00:00 grep > -E UID|dclarke|init > esther# > > Looking in /proc I see : > > esther# > esther# cat /proc/914/status > Name: p0 > State: Z (zombie) > Tgid: 914 > Ngid: 0 > Pid: 914 > PPid: 1 > TracerPid: 0 > Uid: 16411 16411 16411 16411 > Gid: 20002 20002 20002 20002 > FDSize: 0 > Groups: 20002 > NStgid: 914 > NSpid: 914 > NSpgid: 913 > NSsid: 398 > Threads: 2 > SigQ: 2/7323 > SigPnd: 0000000000000000 > ShdPnd: 0000000000000103 > SigBlk: 0000000000000000 > SigIgn: 0000000000000000 > SigCgt: 0000000180000000 > CapInh: 0000000000000000 > CapPrm: 0000000000000000 > CapEff: 0000000000000000 > CapBnd: 0000003fffffffff > CapAmb: 0000000000000000 > NoNewPrivs: 0 > Seccomp: 0 > Speculation_Store_Bypass: vulnerable > Cpus_allowed: 1 > Cpus_allowed_list: 0 > voluntary_ctxt_switches: 13 > nonvoluntary_ctxt_switches: 74 > esther# > > However dmesg reveals far more information : > > . > . > . > [44540.046308] kobject: '(null)' (5fcda702): kobject_cleanup, parent 2a0c29d5 > [44540.060815] kobject: '(null)' (5fcda702): calling ktype release > [44540.230679] kobject: '(null)' (0cf40105): kobject_cleanup, parent 2a0c29d5 > [44540.244669] kobject: '(null)' (0cf40105): calling ktype release > [44540.430165] kobject: '(null)' (1eed3f2a): kobject_cleanup, parent 2a0c29d5 > [44540.444359] kobject: '(null)' (1eed3f2a): calling ktype release > [44540.612080] kobject: '(null)' (b9893805): kobject_cleanup, parent 2a0c29d5 > [44540.625521] kobject: '(null)' (b9893805): calling ktype release > [44540.777358] kobject: '(null)' (6e8d4424): kobject_cleanup, parent 2a0c29d5 > [44540.792340] kobject: '(null)' (6e8d4424): calling ktype release > [44540.902623] kobject: '(null)' (07ba38b5): kobject_cleanup, parent 2a0c29d5 > [44540.916637] kobject: '(null)' (07ba38b5): calling ktype release > [44545.033382] kobject: '(null)' (dbf42766): kobject_cleanup, parent 2a0c29d5 > [44545.048144] kobject: '(null)' (dbf42766): calling ktype release > [44545.242257] kobject: '(null)' (e64a3d73): kobject_cleanup, parent 2a0c29d5 > [44545.255661] kobject: '(null)' (e64a3d73): calling ktype release > [44545.402036] kobject: '(null)' (e43ef4d7): kobject_cleanup, parent 2a0c29d5 > [44545.415573] kobject: '(null)' (e43ef4d7): calling ktype release > [44545.566126] kobject: '(null)' (2c27ba6b): kobject_cleanup, parent 2a0c29d5 > [44545.579740] kobject: '(null)' (2c27ba6b): calling ktype release > [44546.186101] kobject: '(null)' (da4ac031): kobject_cleanup, parent 2a0c29d5 > [44546.188957] BUG: unable to handle page fault for address: f3170000 > [44546.188965] #PF: supervisor read access in kernel mode > [44546.188973] #PF: error_code(0x0000) - not-present page > [44546.188979] *pde = 36f4a067 *pte = 33170060 > [44546.188995] Oops: 0000 [#1] DEBUG_PAGEALLOC > [44546.189004] CPU: 0 PID: 680 Comm: kworker/0:1 Not tainted 5.5.0-rc2-genunix > #1 > [44546.189072] Hardware name: /CN700-8237, BIOS 6.00 PG 11/13/2006 > [44546.189079] Workqueue: events sysfs_slab_remove_workfn > [44546.189090] EIP: hw_bitblt_1+0x240/0x310 [viafb] > [44546.189108] Code: 08 80 fa 02 0f 84 d8 00 00 00 0f b6 55 ec c0 ea 03 0f b6 > d2 0f af ca 83 c1 03 c1 e9 02 74 17 81 c3 00 00 20 00 8d 74 26 00 > 90 <8b> 14 87 89 13 83 c0 01 39 c8 72 f4 8d 65 f4 31 c0 5b 5e 5f 5d c3 > [44546.189116] EAX: 00000994 EBX: f8600000 ECX: 000009a0 EDX: 00000000 > [44546.189124] ESI: 00000002 EDI: f316d9b0 EBP: eb98bc70 ESP: eb98bc50 > [44546.189193] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010083 > [44546.189201] CR0: 80050033 CR2: f3170000 CR3: 2b25b000 CR4: 00000690 > [44546.189206] Call Trace: > [44546.189213] ? hw_bitblt_2+0x2b0/0x2b0 [viafb] > [44546.189219] viafb_imageblit+0x90/0xf0 [viafb] > [44546.189225] bit_putcs+0x215/0x430 > [44546.189231] ? bit_clear+0x120/0x120 > [44546.189236] fbcon_putcs+0xcb/0xe0 > [44546.189242] ? bit_clear+0x120/0x120 > [44546.189248] ? fb_flashcursor+0x100/0x100 > [44546.189315] vt_console_print+0x353/0x400 > [44546.189321] ? insert_char+0xd0/0xd0 > [44546.189327] console_unlock+0x35e/0x4e0 > [44546.189333] vprintk_emit+0x23a/0x2f0 > [44546.189339] vprintk_default+0x17/0x20 > [44546.189345] vprintk_func+0x36/0xb7 > [44546.189350] printk+0x13/0x15 > [44546.189356] __dynamic_pr_debug+0x46/0x70 > [44546.189363] ? __lock_acquire.isra.0+0xfe/0x4e0 > [44546.189369] kobject_put+0x7b/0x190 > [44546.189376] sysfs_slab_remove_workfn+0x30/0x40 > [44546.189382] process_one_work+0x1e4/0x3c0 > [44546.189388] worker_thread+0x14e/0x3b0 > [44546.189395] ? process_one_work+0x3c0/0x3c0 > [44546.189401] kthread+0xdb/0x110 > [44546.189407] ? process_one_work+0x3c0/0x3c0 > [44546.189414] ? kthread_create_on_node+0x20/0x20 > [44546.189419] ret_from_fork+0x2e/0x38 > [44546.189424] Modules linked in: via_camera videobuf2_dma_sg videobuf2_memops > videobuf2_v4l2 videobuf2_comm on videodev mc evdev padlock_sha > padlock_aes snd_pcm uhci_hcd via_cputemp ehci_pci hwmon_vid ehci_hcd snd_ti > mer via_rng viafb snd rng_core usbcore soundcore serio_raw pcspkr > i2c_viapro sg i2c_algo_bit acpi_cpufreq bu tton ip_tables x_tables > autofs4 sd_mod ata_generic fan > [44546.189481] CR2: 00000000f3170000 > [44546.189481] ---[ end trace 5d021d89c9f5c08d ]--- > [44546.189481] EIP: hw_bitblt_1+0x240/0x310 [viafb] > [44546.189481] Code: 08 80 fa 02 0f 84 d8 00 00 00 0f b6 55 ec c0 ea 03 0f b6 > d2 0f af ca 83 c1 03 c1 e9 02 74 17 81 c3 00 00 20 00 8d 74 26 00 > 90 <8b> 14 87 89 13 83 c0 01 39 c8 72 f4 8d 65 f4 31 c0 5b 5e 5f 5d c3 > [44546.189481] EAX: 00000994 EBX: f8600000 ECX: 000009a0 EDX: 00000000 > [44546.189481] ESI: 00000002 EDI: f316d9b0 EBP: eb98bc70 ESP: eb98bc50 > [44546.189481] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010083 > [44546.189481] CR0: 80050033 CR2: f3170000 CR3: 2b25b000 CR4: 00000690 > [44571.433760] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [p0:918] > [44571.433768] Modules linked in: via_camera videobuf2_dma_sg videobuf2_memops > videobuf2_v4l2 videobuf2_comm on videodev mc evdev padlock_sha > padlock_aes snd_pcm uhci_hcd via_cputemp ehci_pci hwmon_vid ehci_hcd snd_ti > mer via_rng viafb snd rng_core usbcore soundcore serio_raw pcspkr > i2c_viapro sg i2c_algo_bit acpi_cpufreq bu tton ip_tables x_tables > autofs4 sd_mod ata_generic fan > [44571.434034] CPU: 0 PID: 918 Comm: p0 Tainted: G D > 5.5.0-rc2-genunix #1 > [44571.434042] Hardware name: /CN700-8237, BIOS 6.00 PG 11/13/2006 > [44571.434047] EIP: 0x437636 > [44571.434066] Code: 83 c4 10 8b 45 e4 c7 40 08 00 00 00 00 eb 6a 8b 45 e4 c7 > 40 0c 00 00 00 00 eb 42 8b 45 e4 8b 50 08 8b 45 e4 8b 40 0c 0f af > c2 <8b> 55 e4 8b 7a 08 8b 55 e4 8b 72 0c 89 c2 c1 fa 1f 8b 4d e4 c1 e7 > [44571.434074] EAX: 003f2551 EBX: 0043a000 ECX: 9652a010 EDX: 00000079 > [44571.434082] ESI: 0079859a EDI: 00790000 EBP: 96529358 ESP: 96529330 > [44571.434151] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296 > [44599.433696] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [htop:893] > [44599.433765] Modules linked in: via_camera videobuf2_dma_sg videobuf2_memops > videobuf2_v4l2 videobuf2_comm on videodev mc evdev padlock_sha > padlock_aes snd_pcm uhci_hcd via_cputemp ehci_pci hwmon_vid ehci_hcd snd_ti > mer via_rng viafb snd rng_core usbcore soundcore serio_raw pcspkr > i2c_viapro sg i2c_algo_bit acpi_cpufreq bu tton ip_tables x_tables > autofs4 sd_mod ata_generic fan > [44599.434032] CPU: 0 PID: 893 Comm: htop Tainted: G D L > 5.5.0-rc2-genunix #1 > [44599.434040] Hardware name: /CN700-8237, BIOS 6.00 PG 11/13/2006 > [44599.434045] EIP: 0xb7f564a7 > [44599.434063] Code: 24 04 89 1a 89 6a 08 89 42 04 8b 44 24 0c 89 4a 18 89 42 > 0c 8b 44 24 10 89 42 10 8b 44 24 08 89 42 14 83 c4 18 89 d0 5b 5e > 5f <5d> c2 04 00 8d 74 26 00 90 f7 c7 00 ff 00 00 74 10 c6 44 24 17 00 > [44599.434132] EAX: bfa3e8a0 EBX: b7f8dafc ECX: 00000000 EDX: bfa3e8a0 > [44599.434140] ESI: bfa4146c EDI: 000004b4 EBP: 00000000 ESP: bfa3e828 > [44599.434148] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000282 > esther# > > Not sure what other information to include however : > > esther# > esther# cat /proc/version > Linux version 5.5.0-rc2-genunix (root@esther) (gcc version 9.2.1 20191130 > (Debian 9.2.1-21)) #1 Tue Dec 17 01:57:17 UTC 2019 > esther# > esther# cat /proc/cpuinfo > processor : 0 > vendor_id : CentaurHauls > cpu family : 6 > model : 10 > model name : VIA Esther processor 1200MHz > stepping : 9 > cpu MHz : 400.000 > cache size : 128 KB > fdiv_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 1 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat > clflush acpi mmx fxsr sse sse2 tm nx cpuid pni est tm2 rng rng_en ace ace_en > ace2 ace2_en phe phe_en pmm pmm_en > bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds > swapgs itlb_multihit > bogomips : 800.02 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 32 bits virtual > power management: > > esther# > esther# cat /proc/meminfo > MemTotal: 937412 kB > MemFree: 70200 kB > MemAvailable: 31728 kB > Buffers: 11400 kB > Cached: 43532 kB > SwapCached: 55872 kB > Active: 385352 kB > Inactive: 400988 kB > Active(anon): 352888 kB > Inactive(anon): 379860 kB > Active(file): 32464 kB > Inactive(file): 21128 kB > Unevictable: 0 kB > Mlocked: 0 kB > HighTotal: 76680 kB > HighFree: 1552 kB > LowTotal: 860732 kB > LowFree: 68648 kB > SwapTotal: 31250428 kB > SwapFree: 29862396 kB > Dirty: 16 kB > Writeback: 0 kB > AnonPages: 676316 kB > Mapped: 16560 kB > Shmem: 1340 kB > KReclaimable: 13748 kB > Slab: 54152 kB > SReclaimable: 13748 kB > SUnreclaim: 40404 kB > KernelStack: 632 kB > PageTables: 2932 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 31719132 kB > Committed_AS: 2333036 kB > VmallocTotal: 122880 kB > VmallocUsed: 11532 kB > VmallocChunk: 0 kB > Percpu: 192 kB > HardwareCorrupted: 0 kB > AnonHugePages: 0 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > FileHugePages: 0 kB > FilePmdMapped: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 4096 kB > Hugetlb: 0 kB > DirectMap4k: 905208 kB > DirectMap4M: 0 kB > esther# > esther# swapon > NAME TYPE SIZE USED PRIO > /dev/sda2 partition 29.8G 1.3G -2 > esther# > > Also I will attach the kernel config from /boot for 5.5.0-rc2-genunix. > > > -- > Dennis Clarke > RISC-V/SPARC/PPC/ARM/CISC > UNIX and Linux spoken > GreyBeard and suspenders optional > > -- > You are receiving this mail because: > You are the assignee for the bug.