RE: Tasks stuck jbd2 for a long time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

> Thanks for the details.  This is something that am interested in trying to potentially to merge, since for a sufficiently coversion-heavy workload (assuming the conversion is happening 
> across multiple inodes, and not just a huge number of random writes into a single fallocated file), limiting the number of kernel threads to one CPU isn't always going to be the right thing.  
>The reason why we had done this way was because at the time, the only choices that we had was between a single kernel thread, or spawning a kernel thread for every single CPU -- 
>which for a very high-core-count system, consumed a huge amount of system resources.  This is no longer the case with the new Concurrency Managed Workqueue (cmwq), but we never 
>did the experiment to make sure cmwq didn't have surprising gotchas.

Thank you for the detailed explanation. 


> I won't have time to look at this before the next merge window, but what I'm hoping to look at is your patch at [2], with two changes:
> a)  Drop the _WQ_ORDERED flag, since it is an internal flag.
> b) Just pass in 0 for max_active instead of "num_active_cpus() > 1 ?
 > num_active_cpus() : 1", for two reasons.  Num_active_cpus() doesn't
 >  take into account CPU hotplugs (for example, if you have a
  > dynmically adjustable VM shape where the number of active CPU's
  > might change over time).  Is there a reason why we need to set that
  > limit?

> Do you see any potential problem with these changes?

Sorry for the late response, after the internal discussion, I can continue on this patch. These 2 points are easy to change, I will also do some xfstest for EXT4 and run BMS on RDS environment to do a quick verify.  We can change num_active_cpus() to 0. Why adding that: just because during fio test, the max active number goes to ~50 we won't see this issue. But this is not necessary. I will see what's Oleg's opinion later offline.

Thanks,
Davina
                                             




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux