On Mon, 8 Mar 2021 20:42:42 +0500 Mikhail Gavrilov wrote: > On Fri, 5 Mar 2021 at 19:22, Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > Yes, it is the same race as we saw before. But after cutting the race > > between poo->stale_lock and pool->lock with the patch above, the race > > between the free path and isolate/putback path came up. > > > > Try the diff below in combination with the patch above > > > > --- x/mm/z3fold.c > > +++ y/mm/z3fold.c > > @@ -1676,8 +1676,10 @@ static void z3fold_page_putback(struct p > > pool = zhdr_to_pool(zhdr); > > > > z3fold_page_lock(zhdr); > > + spin_lock(&pool->lock); > > if (!list_empty(&zhdr->buddy)) > > list_del_init(&zhdr->buddy); > > + spin_unlock(&pool->lock); > > INIT_LIST_HEAD(&page->lru); > > if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) { > > atomic64_dec(&pool->pages_nr); > > Unfortunately even with combination of two latest patches computer > hanged again after two days uptime. Thanks again for your report. > > [185000.747401] list_add corruption. next->prev should be prev > (ffffe0c1bea61f40), but was 0000000000000000. (next=ffff9bb90b444000). At the first glance, the zero pointer goes out of the box of race because 1/ the Call Trace shows it is the free path (of the supposed race victim), 2/ on the race winner side however either list_del or list_del_init would not leave a null pointer behind - the list_add captured in this report is under pool->lock. > [185000.747438] ------------[ cut here ]------------ > [185000.747441] kernel BUG at lib/list_debug.c:23! > [185000.747449] invalid opcode: 0000 [#1] SMP NOPTI > [185000.747454] CPU: 22 PID: 1588003 Comm: Web Content Tainted: G > W --------- --- > 5.12.0-0.rc1.20210305git280d542f6ffa.164.fc35.x86_64 #1 > [185000.747458] Hardware name: System manufacturer System Product > Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021 > [185000.747462] RIP: 0010:__list_add_valid.cold+0xf/0x3f > [185000.747469] Code: 48 c7 c6 7c a9 64 84 48 89 ef 49 c7 c7 ea ff ff > ff e8 9d 81 01 00 e9 5f ee 9c ff 4c 89 c1 48 c7 c7 48 aa 64 84 e8 74 > fd fd ff <0f> 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f8 aa 64 84 e8 5d > fd fd > [185000.747472] RSP: 0000:ffffc0c1c61cfc10 EFLAGS: 00010286 > [185000.747476] RAX: 0000000000000075 RBX: fffffa0d596b0a40 RCX: > 0000000000000000 > [185000.747479] RDX: ffff9bbc097e97a0 RSI: ffff9bbc097daae0 RDI: > ffff9bbc097daae0 > [185000.747482] RBP: ffffe0c1bea61f40 R08: 0000000000000000 R09: > ffffc0c1c61cfa58 > [185000.747485] R10: ffffc0c1c61cfa50 R11: 0000000000000000 R12: > ffff9bb537b4f008 > [185000.747488] R13: ffff9bba5ac29010 R14: ffff9bb90b444000 R15: > ffff9bba5ac29000 > [185000.747491] FS: 00007f198ea257c0(0000) GS:ffff9bbc09600000(0000) > knlGS:0000000000000000 > [185000.747495] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [185000.747498] CR2: 00000fe3aa523500 CR3: 000000012f870000 CR4: > 0000000000350ee0 > [185000.747501] Call Trace: > [185000.747504] do_compact_page+0x28d/0xb60 > [185000.747509] ? _raw_spin_unlock+0x1f/0x30 > [185000.747514] ? z3fold_zpool_free+0x3a8/0x590 > [185000.747518] zswap_free_entry+0x43/0x70 > [185000.747523] zswap_frontswap_invalidate_page+0x8c/0x90 > [185000.747527] __frontswap_invalidate_page+0x5d/0x90 > [185000.747531] swap_range_free+0xcd/0xf0 > [185000.747535] swapcache_free_entries+0x128/0x1a0 > [185000.747539] free_swap_slot+0xbb/0xd0 > [185000.747543] __swap_entry_free+0x7a/0xa0 > [185000.747547] do_swap_page+0x393/0x900 > [185000.747551] __handle_mm_fault+0xbd6/0x1610 > [185000.747557] handle_mm_fault+0xa2/0x270 > [185000.747561] do_user_addr_fault+0x1ea/0x6b0 > [185000.747566] exc_page_fault+0x67/0x2a0 > [185000.747570] ? asm_exc_page_fault+0x8/0x30 > [185000.747574] asm_exc_page_fault+0x1e/0x30 > [185000.747578] RIP: 0033:0x7f198eb8be30 > [185000.747582] Code: 9d 48 81 fa 80 00 00 00 77 19 c5 fe 7f 07 c5 fe > 7f 47 20 c5 fe 7f 44 17 e0 c5 fe 7f 44 17 c0 c5 f8 77 c3 48 8d 8f 80 > 00 00 00 <c5> fe 7f 07 48 83 e1 80 c5 fe 7f 44 17 e0 c5 fe 7f 47 20 c5 > fe 7f > [185000.747585] RSP: 002b:00007ffea7e406e8 EFLAGS: 00010202 > [185000.747589] RAX: 00000fe3aa523500 RBX: 00007ffea7e40738 RCX: > 00000fe3aa523580 > [185000.747592] RDX: 0000000000004000 RSI: 00000000000000fa RDI: > 00000fe3aa523500 > [185000.747594] RBP: 0000600000000000 R08: 00007f195295a000 R09: > ffffffffffffffff > [185000.747597] R10: 0000556a267402c8 R11: 0000000000000206 R12: > 00007f195295a800 > [185000.747600] R13: fffffc0000000000 R14: 00007ffea7e40738 R15: > 00007f195295a7f0 > [185000.747605] Modules linked in: crypto_user tun snd_seq_dummy > snd_hrtimer uinput nls_utf8 isofs rfcomm netconsole nft_objref > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink cmac bnep > zstd sunrpc vfat fat hid_logitech_hidpp usblp hid_logitech_dj > snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > snd_hda_codec_hdmi intel_rapl_msr mt76x2u joydev intel_rapl_common > snd_hda_intel mt76x2_common iwlmvm snd_intel_dspcfg mt76x02_usb > uvcvideo snd_intel_sdw_acpi snd_usb_audio mt76_usb snd_hda_codec > edac_mce_amd videobuf2_vmalloc mt76x02_lib videobuf2_memops > videobuf2_v4l2 snd_hda_core mt76 videobuf2_common snd_usbmidi_lib > kvm_amd btusb snd_hwdep snd_rawmidi videodev btrtl mac80211 kvm > snd_seq btbcm btintel snd_seq_device irqbypass libarc4 eeepc_wmi xpad > mc bluetooth rapl ff_memless snd_pcm > [185000.747647] iwlwifi asus_wmi sparse_keymap video snd_timer > ecdh_generic ecc wmi_bmof pcspkr snd cfg80211 soundcore sp5100_tco > k10temp i2c_piix4 rfkill acpi_cpufreq binfmt_misc ip_tables uas > usb_storage amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched > crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel cec drm > ghash_clmulni_intel ccp igb nvme dca nvme_core i2c_algo_bit wmi > pinctrl_amd fuse > [185000.747878] ---[ end trace df51d3d2498d767d ]--- > [185000.747882] RIP: 0010:__list_add_valid.cold+0xf/0x3f > [185000.747886] Code: 48 c7 c6 7c a9 64 84 48 89 ef 49 c7 c7 ea ff ff > ff e8 9d 81 01 00 e9 5f ee 9c ff 4c 89 c1 48 c7 c7 48 aa 64 84 e8 74 > fd fd ff <0f> 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f8 aa 64 84 e8 5d > fd fd > [185000.747889] RSP: 0000:ffffc0c1c61cfc10 EFLAGS: 00010286 > [185000.747893] RAX: 0000000000000075 RBX: fffffa0d596b0a40 RCX: > 0000000000000000 > [185000.747895] RDX: ffff9bbc097e97a0 RSI: ffff9bbc097daae0 RDI: > ffff9bbc097daae0 > [185000.747898] RBP: ffffe0c1bea61f40 R08: 0000000000000000 R09: > ffffc0c1c61cfa58 > [185000.747901] R10: ffffc0c1c61cfa50 R11: 0000000000000000 R12: > ffff9bb537b4f008 > [185000.747904] R13: ffff9bba5ac29010 R14: ffff9bb90b444000 R15: > ffff9bba5ac29000 > [185000.747907] FS: 00007f198ea257c0(0000) GS:ffff9bbc09600000(0000) > knlGS:0000000000000000 > [185000.747910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [185000.747913] CR2: 00000fe3aa523500 CR3: 000000012f870000 CR4: > 0000000000350ee0 > [185000.747916] note: Web Content[1588003] exited with preempt_count 6 > [185026.580248] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! > [Chrome_ChildIOT:1951362] > [185026.580262] Modules linked in: crypto_user tun snd_seq_dummy > snd_hrtimer uinput nls_utf8 isofs rfcomm netconsole nft_objref > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink cmac bnep > zstd sunrpc vfat fat hid_logitech_hidpp usblp hid_logitech_dj > snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > snd_hda_codec_hdmi intel_rapl_msr mt76x2u joydev intel_rapl_common > snd_hda_intel mt76x2_common iwlmvm snd_intel_dspcfg mt76x02_usb > uvcvideo snd_intel_sdw_acpi snd_usb_audio mt76_usb snd_hda_codec > edac_mce_amd videobuf2_vmalloc mt76x02_lib videobuf2_memops > videobuf2_v4l2 snd_hda_core mt76 videobuf2_common snd_usbmidi_lib > kvm_amd btusb snd_hwdep snd_rawmidi videodev btrtl mac80211 kvm > snd_seq btbcm btintel snd_seq_device irqbypass libarc4 eeepc_wmi xpad > mc bluetooth rapl ff_memless snd_pcm > [185026.580306] iwlwifi asus_wmi sparse_keymap video snd_timer > ecdh_generic ecc wmi_bmof pcspkr snd cfg80211 soundcore sp5100_tco > k10temp i2c_piix4 rfkill acpi_cpufreq binfmt_misc ip_tables uas > usb_storage amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched > crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel cec drm > ghash_clmulni_intel ccp igb nvme dca nvme_core i2c_algo_bit wmi > pinctrl_amd fuse > [185026.580334] irq event stamp: 0 > [185026.580337] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [185026.580342] hardirqs last disabled at (0): [<ffffffff830dd962>] > copy_process+0x902/0x1df0 > [185026.580349] softirqs last enabled at (0): [<ffffffff830dd962>] > copy_process+0x902/0x1df0 > [185026.580353] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [185026.580357] CPU: 0 PID: 1951362 Comm: Chrome_ChildIOT Tainted: G > D W --------- --- > 5.12.0-0.rc1.20210305git280d542f6ffa.164.fc35.x86_64 #1 > [185026.580362] Hardware name: System manufacturer System Product > Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021 > [185026.580365] RIP: 0010:native_queued_spin_lock_slowpath+0x1ce/0x200 > [185026.580370] Code: c1 ef 12 83 e0 03 83 ef 01 48 c1 e0 05 48 63 ff > 48 05 00 f3 1e 00 48 03 04 fd 20 d9 6e 84 48 89 08 8b 41 08 85 c0 75 > 09 f3 90 <8b> 41 08 85 c0 74 f7 48 8b 39 48 85 ff 0f 84 56 ff ff ff 0f > 0d 0f > [185026.580374] RSP: 0000:ffffc0c1d596fbf0 EFLAGS: 00000246 > [185026.580379] RAX: 0000000000000000 RBX: 0000000000f10bea RCX: > ffff9bbc06bef300 > [185026.580382] RDX: ffff9bb500b9fb48 RSI: 0000000000040000 RDI: > 0000000000000013 > [185026.580385] RBP: ffff9bb500b9fb48 R08: 0000000000040000 R09: > 0000000000000000 > [185026.580388] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff9bb500b9fb60 > [185026.580391] R13: ffff9bb500b9fb48 R14: ffffffff84e87020 R15: > ffff9bb500b9fb40 > [185026.580394] FS: 00007f2e2ffe6640(0000) GS:ffff9bbc06a00000(0000) > knlGS:0000000000000000 > [185026.580398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [185026.580401] CR2: 00002a8e6ef59188 CR3: 0000000306082000 CR4: > 0000000000350ef0 > [185026.580405] Call Trace: > [185026.580408] do_raw_spin_lock+0x94/0xa0 > [185026.580665] _raw_spin_lock+0x63/0x80 > [185026.580670] zswap_frontswap_load+0x30/0x2f0 > [185026.580676] ? trace_hardirqs_on+0x1b/0xe0 > [185026.580681] __frontswap_load+0xc3/0x160 > [185026.580685] swap_readpage+0x257/0x430 > [185026.580689] swapin_readahead+0x450/0x4e0 > [185026.580693] ? lock_release+0x1ef/0x410 > [185026.580698] do_swap_page+0x4a4/0x900 > [185026.580703] __handle_mm_fault+0xbd6/0x1610 > [185026.580795] handle_mm_fault+0xa2/0x270 > [185026.580799] do_user_addr_fault+0x1ea/0x6b0 > [185026.580804] exc_page_fault+0x67/0x2a0 > [185026.580808] ? asm_exc_page_fault+0x8/0x30 > [185026.580889] asm_exc_page_fault+0x1e/0x30 > [185026.580893] RIP: 0033:0x55d9d6466038 > [185026.580897] Code: cc cc 55 48 89 e5 48 89 7f 10 48 89 7f 18 5d c3 > cc cc 55 48 89 e5 48 8b 47 10 48 8b 4f 18 48 89 41 10 48 8b 47 10 48 > 8b 4f 18 <48> 89 48 18 0f 57 c0 0f 11 47 10 5d c3 cc cc cc cc cc cc cc > cc cc > [185026.580900] RSP: 002b:00007f2e2ffe46b0 EFLAGS: 00010246 > [185026.580968] RAX: 00002a8e6ef59170 RBX: 00002a8e6ef40420 RCX: > 00002a8e6ef4e370 > [185026.580972] RDX: 00002a8e6d6715e0 RSI: 00002a8e6dcb02e0 RDI: > 00002a8e6ef40420 > [185026.580974] RBP: 00007f2e2ffe46b0 R08: 0000000000000000 R09: > 00007fff260af5d0 > [185026.581039] R10: 0000000000000000 R11: 0000000000000246 R12: > 0000000000000020 > [185026.581042] R13: 00002a8e6dcb02e0 R14: 000055d9deb9b1e0 R15: > 00002a8e6dcb02e0 > > Full kernel log is here: https://pastebin.com/WmBLJ3MR > > -- > Best Regards, > Mike Gavrilov. >