RE: fs: avoid softlockups in s_inodes iterators commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Our light custom is enabled us to load a very high load  IO on the VM/kernel.
In case  I remove them, we will not be able to generate such a high load on the Kernel.

Eric, after I moved the cond_resched to the place you asked for
See below:

--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -35,11 +35,11 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
                spin_unlock(&inode->i_lock);
                spin_unlock(&sb->s_inode_list_lock);
 
+               cond_resched();
                invalidate_mapping_pages(inode->i_mapping, 0, -1);
                iput(toput_inode);
                toput_inode = inode;

We got stuck again after one and a half-day of running under the heavy load:
What we saw on the node is:

>From the system log:
You can see that the system stuck around 0ne hour till we reboot it 

Mar 20 10:08:02 c-node04 kernel: [26251.637328] sh (48738): drop_caches: 3
Mar 20 10:08:47 c-node04 rsyslogd: -- MARK --

Mar 20 10:08:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:09:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:10:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:11:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:12:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:13:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:14:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:15:47 c-node04 rsyslogd: -- MARK --
Mar 20 10:16:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:17:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:18:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:19:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:20:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:21:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:22:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:23:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:24:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:25:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:26:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:27:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:28:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:29:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:30:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:31:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:32:48 c-node04 rsyslogd: -- MARK --
Mar 20 10:33:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:34:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:35:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:36:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:37:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:38:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:39:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:40:49 c-node04 rsyslogd: -- MARK –
Mar 20 10:45:49 c-node04 rsyslogd: -- MARK --
Mar 20 10:46:49 c-node04 rsyslogd: -- MARK --
Mar 20 11:44:49 c-node04 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Mar 20 11:44:49 c-node04 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="11153" x-info="http://www.rsyslog.com";] start
Mar 20 11:44:49 c-node04 kernel: [    0.000000] Linux version 5.4.80-KM8 (david.mozes@kbuilder64-tc8) (gcc version 8.3.1 20190311 (R

>From the GCP  console log we saw that the VM is 100% cpu while we stuck 

several times  we got  the folwing warning 

Mar 20 11:53:29 c-node04 kernel: 
Mar 20 11:53:29 c-node04 kernel: [  569.106644]  virtio_scsi(OE) virtio_pci(OE) virtio_ring(OE) virtio(OE) [last unloaded: scst_local]
Mar 20 11:53:29 c-node04 kernel: [  569.106649] CPU: 42 PID: 39649 Comm: kal_scsi_tgt_cr Kdump: loaded Tainted: G           OE     5.4.80-KM8 #20
Mar 20 11:53:29 c-node04 kernel: [  569.106650] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Mar 20 11:53:29 c-node04 kernel: [  569.106652] RIP: 0010:usercopy_warn+0x7d/0xa0
Mar 20 11:53:29 c-node04 kernel: [  569.106653] Code: 0e af 41 51 48 c7 c0 51 bd 0d af 49 89 f1 48 0f 45 c2 48 89 f9 4d 89 d8 4c 89 d2 48 c7 c7 c8 ff 0e af 48 89 c6 e8 8c 5e df ff <0f> 0b 48 83 c4 18 c3 48 c7 c6 9f f9 0f af 49 89 f1 49 89 f3 eb 96
Mar 20 11:53:29 c-node04 kernel: [  569.106654] RSP: 0018:ffff9dbb7845bda0 EFLAGS: 00010286
Mar 20 11:53:29 c-node04 kernel: [  569.106654] RAX: 0000000000000000 RBX: ffff9dca3ee24b88 RCX: 0000000000000006
Mar 20 11:53:29 c-node04 kernel: [  569.106655] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9dca47ad74a0
Mar 20 11:53:29 c-node04 kernel: [  569.106655] RBP: 0000000000000238 R08: 00000000000008ed R09: 0000000000000002
Mar 20 11:53:29 c-node04 kernel: [  569.106655] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
Mar 20 11:53:29 c-node04 kernel: [  569.106656] R13: ffff9dca3ee24dc0 R14: 0000000000000238 R15: ffff9dca3ee24b88
Mar 20 11:53:29 c-node04 kernel: [  569.106657] FS:  00007fa42daf0700(0000) GS:ffff9dca47ac0000(0000) knlGS:0000000000000000
Mar 20 11:53:29 c-node04 kernel: [  569.106657] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 11:53:29 c-node04 kernel: [  569.106658] CR2: 00007fabfd7d2000 CR3: 0000003bbda9a001 CR4: 00000000003606e0
Mar 20 11:53:29 c-node04 kernel: [  569.106661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 11:53:29 c-node04 kernel: [  569.106661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 20 11:53:29 c-node04 kernel: [  569.106662] Call Trace:
Mar 20 11:53:29 c-node04 kernel: [  569.106669]  __check_object_size+0x162/0x173
Mar 20 11:53:29 c-node04 kernel: [  569.106676]  dev_user_reply_get_cmd.isra.17+0x198/0x420 [scst_user]
Mar 20 11:53:29 c-node04 kernel: [  569.106679]  dev_user_ioctl+0x317/0x71a [scst_user]
Mar 20 11:53:29 c-node04 kernel: [  569.106683]  ? ep_poll+0x88/0x460
Mar 20 11:53:29 c-node04 kernel: [  569.106688]  do_vfs_ioctl+0xa2/0x600
Mar 20 11:53:29 c-node04 kernel: [  569.106691]  ? finish_wait+0x80/0x80
Mar 20 11:53:29 c-node04 kernel: [  569.106692]  ksys_ioctl+0x60/0x90
Mar 20 11:53:29 c-node04 kernel: [  569.106693]  __x64_sys_ioctl+0x16/0x20
Mar 20 11:53:29 c-node04 kernel: [  569.106698]  do_syscall_64+0x55/0x1a0
Mar 20 11:53:29 c-node04 kernel: [  569.106703]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 20 11:53:29 c-node04 kernel: [  569.106706] RIP: 0033:0x7fac04913b77
Mar 20 11:53:29 c-node04 kernel: [  569.106707] Code: 90 90 90 48 8b 05 51 d4 2a 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 90 90 90 90 90 90 90 90 90 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 d4 2a 00 31 d2 48 29 c2 64
Mar 20 11:53:29 c-node04 kernel: [  569.106708] RSP: 002b:00007fa42dae9788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 20 11:53:29 c-node04 kernel: [  569.106709] RAX: ffffffffffffffda RBX: 00007fa46c127140 RCX: 00007fac04913b77

The bottom line the system is looking more stable without the:
cond_resched()

will try to run without this commad that happens every hour. ( a lagitimc Linux command) 
echo 3 > /proc/sys/vm/drop_caches

Thx
David

-----Original Message-----
From: Eric Sandeen <sandeen@xxxxxxxxxx> 
Sent: Wednesday, March 17, 2021 9:59 PM
To: David Mozes <david.mozes@xxxxxxx>; Eric Sandeen <sandeen@xxxxxxxxxxx>; linux-fsdevel@xxxxxxxxxxxxxxx
Subject: Re: fs: avoid softlockups in s_inodes iterators commit

On 3/17/21 11:45 AM, David Mozes wrote:
> Send gin the stack of the first case on different run 
> 
> panic on 25.2.2021
> whatchg on server 4. the pmc was server w\2
> Feb 23 05:46:06 c-node04 kernel: [125259.990332] watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [kuic_msg_domain:15790]
> Feb 23 05:46:06 c-node04 kernel: [125259.990333] Modules linked in: iscsi_scst(OE) crc32c_intel(O) scst_local(OE) scst_user(OE) scst(OE) drbd(O) lru_cache(O) 8021q(O) mrp(O) garp(O) netconsole(O) nfsd(O) nfs_acl(O) auth_rpcgss(O) lockd(O) sunrpc(O) grace(O) xt_MASQUERADE(O) xt_nat(O) xt_state(O) iptable_nat(O) xt_addrtype(O) xt_conntrack(O) nf_nat(O) nf_conntrack(O) nf_defrag_ipv4(O) nf_defrag_ipv6(O) libcrc32c(O) br_netfilter(O) bridge(O) stp(O) llc(O) overlay(O) be2iscsi(O) iscsi_boot_sysfs(O) bnx2i(O) cnic(O) uio(O) cxgb4i(O) cxgb4(O) cxgb3i(O) libcxgbi(O) cxgb3(O) mdio(O) libcxgb(O) ib_iser(OE) iscsi_tcp(O) libiscsi_tcp(O) libiscsi(O) scsi_transport_iscsi(O) dm_multipath(O) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mdev(OE) mlxfw(OE) ptp(O) pps_core(O) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) fuse(O) binfmt_misc(O) pvpanic(O) pcspkr(O) virtio_rng(O) virtio_net(O) net_failover(O) failover(O) i2c_piix4(
> Feb 23 05:46:06 c-node04 kernel: O) ext4(OE)
> Feb 23 05:46:06 c-node04 kernel: [125259.990368]  jbd2(OE) mbcache(OE) virtio_scsi(OE) virtio_pci(OE) virtio_ring(OE) virtio(OE) [last unloaded: scst_local]

ok, you still haven't said what your "light custom" changes to this kernel are, and all of your modules are out of tree (O) and/or unsigned (E) so I would suggest first trying to reproduce this on something a lot less messy and closer to upstream.

-Eric




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux