Our light custom is enabled us to load a very high load IO on the VM/kernel. In case I remove them, we will not be able to generate such a high load on the Kernel. Eric, after I moved the cond_resched to the place you asked for See below: --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -35,11 +35,11 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); + cond_resched(); invalidate_mapping_pages(inode->i_mapping, 0, -1); iput(toput_inode); toput_inode = inode; We got stuck again after one and a half-day of running under the heavy load: What we saw on the node is: >From the system log: You can see that the system stuck around 0ne hour till we reboot it Mar 20 10:08:02 c-node04 kernel: [26251.637328] sh (48738): drop_caches: 3 Mar 20 10:08:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:08:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:09:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:10:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:11:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:12:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:13:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:14:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:15:47 c-node04 rsyslogd: -- MARK -- Mar 20 10:16:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:17:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:18:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:19:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:20:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:21:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:22:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:23:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:24:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:25:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:26:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:27:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:28:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:29:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:30:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:31:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:32:48 c-node04 rsyslogd: -- MARK -- Mar 20 10:33:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:34:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:35:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:36:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:37:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:38:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:39:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:40:49 c-node04 rsyslogd: -- MARK – Mar 20 10:45:49 c-node04 rsyslogd: -- MARK -- Mar 20 10:46:49 c-node04 rsyslogd: -- MARK -- Mar 20 11:44:49 c-node04 kernel: imklog 5.8.10, log source = /proc/kmsg started. Mar 20 11:44:49 c-node04 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="11153" x-info="http://www.rsyslog.com"] start Mar 20 11:44:49 c-node04 kernel: [ 0.000000] Linux version 5.4.80-KM8 (david.mozes@kbuilder64-tc8) (gcc version 8.3.1 20190311 (R >From the GCP console log we saw that the VM is 100% cpu while we stuck several times we got the folwing warning Mar 20 11:53:29 c-node04 kernel: Mar 20 11:53:29 c-node04 kernel: [ 569.106644] virtio_scsi(OE) virtio_pci(OE) virtio_ring(OE) virtio(OE) [last unloaded: scst_local] Mar 20 11:53:29 c-node04 kernel: [ 569.106649] CPU: 42 PID: 39649 Comm: kal_scsi_tgt_cr Kdump: loaded Tainted: G OE 5.4.80-KM8 #20 Mar 20 11:53:29 c-node04 kernel: [ 569.106650] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Mar 20 11:53:29 c-node04 kernel: [ 569.106652] RIP: 0010:usercopy_warn+0x7d/0xa0 Mar 20 11:53:29 c-node04 kernel: [ 569.106653] Code: 0e af 41 51 48 c7 c0 51 bd 0d af 49 89 f1 48 0f 45 c2 48 89 f9 4d 89 d8 4c 89 d2 48 c7 c7 c8 ff 0e af 48 89 c6 e8 8c 5e df ff <0f> 0b 48 83 c4 18 c3 48 c7 c6 9f f9 0f af 49 89 f1 49 89 f3 eb 96 Mar 20 11:53:29 c-node04 kernel: [ 569.106654] RSP: 0018:ffff9dbb7845bda0 EFLAGS: 00010286 Mar 20 11:53:29 c-node04 kernel: [ 569.106654] RAX: 0000000000000000 RBX: ffff9dca3ee24b88 RCX: 0000000000000006 Mar 20 11:53:29 c-node04 kernel: [ 569.106655] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9dca47ad74a0 Mar 20 11:53:29 c-node04 kernel: [ 569.106655] RBP: 0000000000000238 R08: 00000000000008ed R09: 0000000000000002 Mar 20 11:53:29 c-node04 kernel: [ 569.106655] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 Mar 20 11:53:29 c-node04 kernel: [ 569.106656] R13: ffff9dca3ee24dc0 R14: 0000000000000238 R15: ffff9dca3ee24b88 Mar 20 11:53:29 c-node04 kernel: [ 569.106657] FS: 00007fa42daf0700(0000) GS:ffff9dca47ac0000(0000) knlGS:0000000000000000 Mar 20 11:53:29 c-node04 kernel: [ 569.106657] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 11:53:29 c-node04 kernel: [ 569.106658] CR2: 00007fabfd7d2000 CR3: 0000003bbda9a001 CR4: 00000000003606e0 Mar 20 11:53:29 c-node04 kernel: [ 569.106661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 20 11:53:29 c-node04 kernel: [ 569.106661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 20 11:53:29 c-node04 kernel: [ 569.106662] Call Trace: Mar 20 11:53:29 c-node04 kernel: [ 569.106669] __check_object_size+0x162/0x173 Mar 20 11:53:29 c-node04 kernel: [ 569.106676] dev_user_reply_get_cmd.isra.17+0x198/0x420 [scst_user] Mar 20 11:53:29 c-node04 kernel: [ 569.106679] dev_user_ioctl+0x317/0x71a [scst_user] Mar 20 11:53:29 c-node04 kernel: [ 569.106683] ? ep_poll+0x88/0x460 Mar 20 11:53:29 c-node04 kernel: [ 569.106688] do_vfs_ioctl+0xa2/0x600 Mar 20 11:53:29 c-node04 kernel: [ 569.106691] ? finish_wait+0x80/0x80 Mar 20 11:53:29 c-node04 kernel: [ 569.106692] ksys_ioctl+0x60/0x90 Mar 20 11:53:29 c-node04 kernel: [ 569.106693] __x64_sys_ioctl+0x16/0x20 Mar 20 11:53:29 c-node04 kernel: [ 569.106698] do_syscall_64+0x55/0x1a0 Mar 20 11:53:29 c-node04 kernel: [ 569.106703] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 20 11:53:29 c-node04 kernel: [ 569.106706] RIP: 0033:0x7fac04913b77 Mar 20 11:53:29 c-node04 kernel: [ 569.106707] Code: 90 90 90 48 8b 05 51 d4 2a 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 90 90 90 90 90 90 90 90 90 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 d4 2a 00 31 d2 48 29 c2 64 Mar 20 11:53:29 c-node04 kernel: [ 569.106708] RSP: 002b:00007fa42dae9788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Mar 20 11:53:29 c-node04 kernel: [ 569.106709] RAX: ffffffffffffffda RBX: 00007fa46c127140 RCX: 00007fac04913b77 The bottom line the system is looking more stable without the: cond_resched() will try to run without this commad that happens every hour. ( a lagitimc Linux command) echo 3 > /proc/sys/vm/drop_caches Thx David -----Original Message----- From: Eric Sandeen <sandeen@xxxxxxxxxx> Sent: Wednesday, March 17, 2021 9:59 PM To: David Mozes <david.mozes@xxxxxxx>; Eric Sandeen <sandeen@xxxxxxxxxxx>; linux-fsdevel@xxxxxxxxxxxxxxx Subject: Re: fs: avoid softlockups in s_inodes iterators commit On 3/17/21 11:45 AM, David Mozes wrote: > Send gin the stack of the first case on different run > > panic on 25.2.2021 > whatchg on server 4. the pmc was server w\2 > Feb 23 05:46:06 c-node04 kernel: [125259.990332] watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [kuic_msg_domain:15790] > Feb 23 05:46:06 c-node04 kernel: [125259.990333] Modules linked in: iscsi_scst(OE) crc32c_intel(O) scst_local(OE) scst_user(OE) scst(OE) drbd(O) lru_cache(O) 8021q(O) mrp(O) garp(O) netconsole(O) nfsd(O) nfs_acl(O) auth_rpcgss(O) lockd(O) sunrpc(O) grace(O) xt_MASQUERADE(O) xt_nat(O) xt_state(O) iptable_nat(O) xt_addrtype(O) xt_conntrack(O) nf_nat(O) nf_conntrack(O) nf_defrag_ipv4(O) nf_defrag_ipv6(O) libcrc32c(O) br_netfilter(O) bridge(O) stp(O) llc(O) overlay(O) be2iscsi(O) iscsi_boot_sysfs(O) bnx2i(O) cnic(O) uio(O) cxgb4i(O) cxgb4(O) cxgb3i(O) libcxgbi(O) cxgb3(O) mdio(O) libcxgb(O) ib_iser(OE) iscsi_tcp(O) libiscsi_tcp(O) libiscsi(O) scsi_transport_iscsi(O) dm_multipath(O) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mdev(OE) mlxfw(OE) ptp(O) pps_core(O) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) fuse(O) binfmt_misc(O) pvpanic(O) pcspkr(O) virtio_rng(O) virtio_net(O) net_failover(O) failover(O) i2c_piix4( > Feb 23 05:46:06 c-node04 kernel: O) ext4(OE) > Feb 23 05:46:06 c-node04 kernel: [125259.990368] jbd2(OE) mbcache(OE) virtio_scsi(OE) virtio_pci(OE) virtio_ring(OE) virtio(OE) [last unloaded: scst_local] ok, you still haven't said what your "light custom" changes to this kernel are, and all of your modules are out of tree (O) and/or unsigned (E) so I would suggest first trying to reproduce this on something a lot less messy and closer to upstream. -Eric