Hi, I referred to the patch set to address this issue here: https://lkml.org/lkml/2018/2/22/82 If I take the original suggestion and place the fscache_enqueue_retrieval() within the spin lock, it seems to keep the kernel running. I have no idea if that is the right thing to do but it works for me. Daire On Wed, Aug 21, 2019 at 4:09 PM Daire Byrne <daire.byrne@xxxxxxxxx> wrote: > I'm still seeing this in kernel 5.2.9. I previously reported in 5.0.5 a > few months ago. I just have to populate the cache for around 30 minutes to > trigger it. > > I'm happy to test patches or provide more debug info to assist. > > Daire > > [ 8592.370028] FS-Cache: > [ 8592.372621] FS-Cache: Assertion failed > [ 8592.376527] FS-Cache: 4 == 5 is false > [ 8592.380352] ------------[ cut here ]------------ > [ 8592.385108] kernel BUG at fs/fscache/operation.c:70! > [ 8592.390237] invalid opcode: 0000 [#1] SMP PTI > [ 8592.394741] CPU: 14 PID: 81 Comm: ksoftirqd/14 Not tainted > 5.2.9-1.el7.elrepo.x86_64 #1 > [ 8592.403166] Hardware name: Google Google Compute Engine/Google Compute > Engine, BIOS Google 01/01/2011 > [ 8592.412569] RIP: 0010:fscache_enqueue_operation+0x1c3/0x240 [fscache] > [ 8592.419154] Code: c0 e8 df 95 d4 e0 48 c7 c7 30 c5 3b a0 31 c0 e8 d1 95 > d4 e0 8b 73 40 ba 05 00 00 00 48 c7 c7 20 b5 3b a0 31 c0 e8 bb 95 d4 e0 > <0f> 0b 48 c7 c7 22 c5 3b a0 31 c0 e8 ab 95 d4 e0 48 c7 c7 30 c5 3b > [ 8592.438543] RSP: 0018:ffffc90006577b40 EFLAGS: 00010046 > [ 8592.443917] RAX: 0000000000000019 RBX: ffff888fa4174c00 RCX: > 0000000000000006 > [ 8592.451392] RDX: 0000000000000000 RSI: 0000000000000082 RDI: > ffff88907db97940 > [ 8592.458676] RBP: ffffc90006577b58 R08: 00000000000001dc R09: > 00000000000001dc > [ 8592.466119] R10: 0000000000000005 R11: ffffffff826f8cc4 R12: > ffff888fa4174c00 > [ 8592.473397] R13: ffff888f8800d480 R14: ffff888fa4174c88 R15: > ffff888f9c8ed460 > [ 8592.480672] FS: 0000000000000000(0000) GS:ffff88907db80000(0000) > knlGS:0000000000000000 > [ 8592.488926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8592.494820] CR2: 00007f60e658f000 CR3: 0000001074b70001 CR4: > 00000000001606e0 > [ 8592.502349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 8592.509640] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 8592.517027] Call Trace: > [ 8592.519662] cachefiles_read_waiter+0x10b/0x170 [cachefiles] > [ 8592.525497] __wake_up_common+0x8f/0x160 > [ 8592.529600] __wake_up_locked_key_bookmark+0x1b/0x20 > [ 8592.534725] wake_up_page_bit+0xab/0x100 > [ 8592.538813] unlock_page+0x24/0x30 > [ 8592.542399] mpage_end_io+0x66/0xd0 [ext4] > [ 8592.546662] bio_endio+0xe6/0x180 > [ 8592.550151] blk_update_request+0x9e/0x330 > [ 8592.554504] scsi_end_request+0x2c/0x150 > [ 8592.558671] scsi_io_completion+0x88/0x4c0 > [ 8592.562932] ? __switch_to_asm+0x40/0x70 > [ 8592.567101] ? __switch_to_asm+0x34/0x70 > [ 8592.571292] scsi_finish_command+0xdc/0x120 > [ 8592.575629] scsi_softirq_done+0x145/0x170 > [ 8592.579888] blk_done_softirq+0xa5/0xd0 > [ 8592.583888] __do_softirq+0xd5/0x2a2 > [ 8592.587719] run_ksoftirqd+0x2b/0x40 > [ 8592.591479] smpboot_thread_fn+0x11f/0x180 > [ 8592.595731] kthread+0x105/0x140 > [ 8592.599115] ? sort_range+0x30/0x30 > [ 8592.602759] ? kthread_bind+0x20/0x20 > [ 8592.606596] ret_from_fork+0x35/0x40 > [ 8592.610363] Modules linked in: nfsv3 nfs cachefiles fscache ext4 > mbcache jbd2 sb_edac i2c_piix4 input_leds sg intel_rapl_perf nfsd > auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs libcrc32c sd_mod > crct10dif_pclmul crc32_pclmul 8021q crc32c_intel garp mrp > ghash_clmulni_intel virtio_net net_failover failover virtio_scsi > aesni_intel scsi_transport_iscsi crypto_simd cryptd glue_helper virtio_pci > virtio_ring virtio serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod > [ 8592.652533] ---[ end trace 80c6b6d683b29602 ]--- > [ 8592.657332] RIP: 0010:fscache_enqueue_operation+0x1c3/0x240 [fscache] > [ 8592.663920] Code: c0 e8 df 95 d4 e0 48 c7 c7 30 c5 3b a0 31 c0 e8 d1 95 > d4 e0 8b 73 40 ba 05 00 00 00 48 c7 c7 20 b5 3b a0 31 c0 e8 bb 95 d4 e0 > <0f> 0b 48 c7 c7 22 c5 3b a0 31 c0 e8 ab 95 d4 e0 48 c7 c7 30 c5 3b > [ 8592.682850] RSP: 0018:ffffc90006577b40 EFLAGS: 00010046 > [ 8592.688440] RAX: 0000000000000019 RBX: ffff888fa4174c00 RCX: > 0000000000000006 > [ 8592.695740] RDX: 0000000000000000 RSI: 0000000000000082 RDI: > ffff88907db97940 > [ 8592.703206] RBP: ffffc90006577b58 R08: 00000000000001dc R09: > 00000000000001dc > [ 8592.710510] R10: 0000000000000005 R11: ffffffff826f8cc4 R12: > ffff888fa4174c00 > [ 8592.718026] R13: ffff888f8800d480 R14: ffff888fa4174c88 R15: > ffff888f9c8ed460 > [ 8592.725488] FS: 0000000000000000(0000) GS:ffff88907db80000(0000) > knlGS:0000000000000000 > [ 8592.733739] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8592.739862] CR2: 00007f60e658f000 CR3: 0000001074b70001 CR4: > 00000000001606e0 > [ 8592.747150] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 8592.754436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 8592.761740] Kernel panic - not syncing: Fatal exception in interrupt > [ 8593.867017] Shutting down cpus with NMI > [ 8593.871953] Kernel Offset: disabled > [ 8593.875615] ---[ end Kernel panic - not syncing: Fatal exception in > interrupt ]--- > > On Sat, Apr 6, 2019 at 9:55 AM Daire Byrne <daire.byrne@xxxxxxxxx> wrote: > >> Hi, >> >> I can reproduce this bug under heavy cache read loads within a few >> minutes. It seem like it happens pretty frequently when the entire data set >> has been cached and so we are reading entirely from disk. I found these >> previous patches that looked to address similar issues here: >> https://lkml.org/lkml/2018/7/4/546 >> >> I'm happy to test any patches to verify fixes. >> >> Daire >> >> >> [ 6598.153866] FS-Cache: Assertion failed >> [ 6598.157778] FS-Cache: 4 == 5 is false >> [ 6598.161638] kernel BUG at fs/fscache/operation.c:74! >> [ 6598.166771] invalid opcode: 0000 [#1] SMP PTI >> [ 6598.171397] CPU: 14 PID: 1430 Comm: nfsd Not tainted >> 5.0.5-1.el7.elrepo.x86_64 #1 >> [ 6598.179034] Hardware name: Google Google Compute Engine/Google Compute >> Engine, BIOS Google 01/01/2011 >> [ 6598.188417] RIP: 0010:fscache_enqueue_operation+0x1c3/0x240 [fscache] >> [ 6598.195006] Code: c0 e8 df 86 c1 e0 48 c7 c7 30 85 4e a0 31 c0 e8 d1 >> 86 c1 e0 8b 73 40 ba 05 00 00 00 48 c7 c7 20 75 4e a0 31 c0 e8 bb 86 c1 e0 >> <0f> 0b 48 c7 c7 22 85 4e a0 31 c0 e8 ab 86 c1 e0 48 c7 c7 30 85 4e >> [ 6598.214256] RSP: 0018:ffff88907db83c28 EFLAGS: 00010046 >> [ 6598.219628] RAX: 0000000000000019 RBX: ffff88810039b200 RCX: >> 0000000000000006 >> [ 6598.226910] RDX: 0000000000000000 RSI: 0000000000000082 RDI: >> ffff88907db96980 >> [ 6598.234187] RBP: ffff88907db83c40 R08: 0000000000000000 R09: >> 00000000000001d8 >> [ 6598.241464] R10: 0000000000000005 R11: 0000000000000000 R12: >> ffff88810039b200 >> [ 6598.248839] R13: ffff888e14d34000 R14: ffff888cc24353a0 R15: >> ffff888cc2435340 >> [ 6598.256125] FS: 0000000000000000(0000) GS:ffff88907db80000(0000) >> knlGS:0000000000000000 >> [ 6598.264364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 6598.270255] CR2: 00007fdb83171000 CR3: 0000001057f9e005 CR4: >> 00000000001606e0 >> [ 6598.277538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 6598.284821] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >> 0000000000000400 >> [ 6598.292233] Call Trace: >> [ 6598.294846] <IRQ> >> [ 6598.297019] cachefiles_read_waiter+0x10b/0x170 [cachefiles] >> [ 6598.302839] __wake_up_common+0x8f/0x160 >> [ 6598.306910] __wake_up_locked_key_bookmark+0x1b/0x20 >> [ 6598.312044] wake_up_page_bit+0xab/0x100 >> [ 6598.316112] unlock_page+0x24/0x30 >> [ 6598.319678] mpage_end_io+0x29/0x70 [ext4] >> [ 6598.323924] bio_endio+0xe6/0x180 >> [ 6598.327397] blk_update_request+0x9e/0x300 >> [ 6598.331650] scsi_end_request+0x30/0x1a0 >> [ 6598.335724] scsi_io_completion+0x8e/0x5f0 >> [ 6598.339963] scsi_finish_command+0xdc/0x130 >> [ 6598.344315] scsi_softirq_done+0x145/0x170 >> [ 6598.348563] blk_done_softirq+0xa5/0xd0 >> [ 6598.352555] __do_softirq+0xd5/0x2a2 >> [ 6598.356301] irq_exit+0xe8/0x100 >> [ 6598.359683] do_IRQ+0x59/0xe0 >> [ 6598.362804] common_interrupt+0xf/0xf >> [ 6598.366624] </IRQ> >> [ 6598.368898] RIP: 0010:dd_insert_requests+0x6/0x210 >> [ 6598.373847] Code: 58 b8 75 91 31 db eb 99 31 c0 eb a3 0f 0b 31 c0 0f >> 1f 40 00 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 >> <48> 89 e5 41 57 41 56 41 55 41 54 53 48 89 f3 48 83 ec 18 48 8b 87 >> [ 6598.393060] RSP: 0018:ffffc90007a4f2d8 EFLAGS: 00000246 ORIG_RAX: >> ffffffffffffffde >> [ 6598.400802] RAX: ffff889062c5f800 RBX: ffff889062c5e000 RCX: >> 0000000000000000 >> [ 6598.408098] RDX: 0000000000000000 RSI: ffffc90007a4f338 RDI: >> ffff889062c5e000 >> [ 6598.415376] RBP: ffffc90007a4f2f8 R08: ffffffff813edb10 R09: >> ffff889062cbe9c0 >> [ 6598.422675] R10: 0000000000000001 R11: 0000000000001000 R12: >> 0000000000000000 >> [ 6598.429963] R13: ffff889063114848 R14: ffffc90007a4f338 R15: >> ffff88906238d3c8 >> [ 6598.437253] ? deadline_fifo_request+0x110/0x110 >> [ 6598.442015] ? blk_mq_sched_insert_requests+0x3e/0x80 >> [ 6598.447210] blk_mq_flush_plug_list+0x177/0x280 >> [ 6598.452117] ? __blk_mq_get_tag+0x21/0x90 >> [ 6598.456273] blk_flush_plug_list+0xd1/0x100 >> [ 6598.460605] blk_mq_make_request+0x1f8/0x500 >> [ 6598.465025] generic_make_request+0x191/0x3c0 >> [ 6598.469531] submit_bio+0x75/0x140 >> [ 6598.473082] ? bio_add_page+0x4b/0x60 >> [ 6598.476910] ext4_mpage_readpages+0x41f/0x840 [ext4] >> [ 6598.482038] ? scan_shadow_nodes+0x30/0x30 >> [ 6598.486302] ext4_readpage+0x4d/0xb0 [ext4] >> [ 6598.490641] cachefiles_read_or_alloc_pages+0x69b/0xf10 [cachefiles] >> [ 6598.497145] ? __wait_on_bit+0x7d/0x90 >> [ 6598.501046] ? wake_up_bit+0x41/0x50 >> [ 6598.504779] ? fscache_run_op.isra.11+0x5c/0xf0 [fscache] >> [ 6598.510339] ? fscache_submit_op+0x264/0x3f0 [fscache] >> [ 6598.515634] __fscache_read_or_alloc_pages+0x239/0x380 [fscache] >> [ 6598.521836] __nfs_readpages_from_fscache+0x6b/0x190 [nfs] >> [ 6598.528319] nfs_readpages+0xbf/0x1c0 [nfs] >> [ 6598.532655] ? dev_hard_start_xmit+0x9e/0x220 >> [ 6598.537311] ? __alloc_pages_nodemask+0x133/0x2e0 >> [ 6598.542161] read_pages+0x67/0x1a0 >> [ 6598.545707] __do_page_cache_readahead+0x1c5/0x1e0 >> [ 6598.550645] ondemand_readahead+0x172/0x2b0 >> [ 6598.554979] page_cache_sync_readahead+0x85/0xc0 >> [ 6598.559753] generic_file_read_iter+0x942/0xdc0 >> [ 6598.564456] ? nfs_check_cache_invalid+0x38/0xa0 [nfs] >> [ 6598.569894] ? nfs_mapping_need_revalidate_inode+0x17/0x40 [nfs] >> [ 6598.576445] ? _cond_resched+0x19/0x30 >> [ 6598.580352] nfs_file_read+0x6d/0xc0 [nfs] >> [ 6598.584607] generic_file_splice_read+0xfd/0x1c0 >> [ 6598.589379] do_splice_to+0x76/0x90 >> [ 6598.593015] splice_direct_to_actor+0x10d/0x250 >> [ 6598.597706] ? fsid_source+0x60/0x60 [nfsd] >> [ 6598.602040] nfsd_splice_read+0x7f/0x110 [nfsd] >> [ 6598.606740] nfsd_read+0x1ac/0x1d0 [nfsd] >> [ 6598.610902] nfsd3_proc_read+0xb4/0x160 [nfsd] >> [ 6598.615503] nfsd_dispatch+0xc1/0x260 [nfsd] >> [ 6598.619951] svc_process_common+0x3e1/0x7f0 [sunrpc] >> [ 6598.625067] svc_process+0xfc/0x110 [sunrpc] >> [ 6598.629489] nfsd+0xe9/0x160 [nfsd] >> [ 6598.633132] kthread+0x105/0x140 >> [ 6598.636513] ? nfsd_destroy+0x60/0x60 [nfsd] >> [ 6598.640935] ? kthread_bind+0x20/0x20 >> [ 6598.644752] ret_from_fork+0x35/0x40 >> [ 6598.648478] Modules linked in: nfsv3 nfs cachefiles fscache ext4 >> mbcache jbd2 sg pcc_cpufreq sb_edac i2c_piix4 intel_rapl_perf input_leds >> nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs libcrc32c >> sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel 8021q garp mrp >> ghash_clmulni_intel virtio_net net_failover failover virtio_scsi >> aesni_intel crypto_simd cryptd virtio_pci glue_helper virtio_ring virtio >> scsi_transport_iscsi serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod >> [ 6598.691798] ---[ end trace a77e949852c4460c ]--- >> [ 6598.696688] RIP: 0010:fscache_enqueue_operation+0x1c3/0x240 [fscache] >> [ 6598.703370] Code: c0 e8 df 86 c1 e0 48 c7 c7 30 85 4e a0 31 c0 e8 d1 >> 86 c1 e0 8b 73 40 ba 05 00 00 00 48 c7 c7 20 75 4e a0 31 c0 e8 bb 86 c1 e0 >> <0f> 0b 48 c7 c7 22 85 4e a0 31 c0 e8 ab 86 c1 e0 48 c7 c7 30 85 4e >> [ 6598.722331] RSP: 0018:ffff88907db83c28 EFLAGS: 00010046 >> [ 6598.727714] RAX: 0000000000000019 RBX: ffff88810039b200 RCX: >> 0000000000000006 >> [ 6598.734999] RDX: 0000000000000000 RSI: 0000000000000082 RDI: >> ffff88907db96980 >> [ 6598.742287] RBP: ffff88907db83c40 R08: 0000000000000000 R09: >> 00000000000001d8 >> [ 6598.749579] R10: 0000000000000005 R11: 0000000000000000 R12: >> ffff88810039b200 >> [ 6598.756867] R13: ffff888e14d34000 R14: ffff888cc24353a0 R15: >> ffff888cc2435340 >> [ 6598.764623] FS: 0000000000000000(0000) GS:ffff88907db80000(0000) >> knlGS:0000000000000000 >> [ 6598.772866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 6598.778877] CR2: 00007fdb83171000 CR3: 0000001057f9e005 CR4: >> 00000000001606e0 >> [ 6598.786167] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 6598.793464] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >> 0000000000000400 >> [ 6598.800886] Kernel panic - not syncing: Fatal exception in interrupt >> [ 6599.929552] Shutting down cpus with NMI >> [ 6599.934409] Kernel Offset: disabled >> [ 6599.938060] ---[ end Kernel panic - not syncing: Fatal exception in >> interrupt ]--- >> > -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs