On 3/21/21 6:47 AM, David Mozes wrote: > > Our light custom is enabled us to load a very high load IO on the VM/kernel. If you can't explain what this change is, I'm afraid the default assumption will be that your <unspecified> changes have contributed to the problem. > In case I remove them, we will not be able to generate such a high load on the Kernel. > > Eric, after I moved the cond_resched to the place you asked for > See below: > > --- a/fs/drop_caches.c > +++ b/fs/drop_caches.c > @@ -35,11 +35,11 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) > spin_unlock(&inode->i_lock); > spin_unlock(&sb->s_inode_list_lock); > > + cond_resched(); > invalidate_mapping_pages(inode->i_mapping, 0, -1); > iput(toput_inode); > toput_inode = inode; > > We got stuck again after one and a half-day of running under the heavy load: > What we saw on the node is: <a different backtrace than before with the actual warning omitted> Ok, then the change you flagged from my commit is not the root cause of your problem. At this point, I can only presume that your "light custom" is the root cause. If you can't reproduce the problem on a stock kernel, then I don't think we can proceed further. -Eric