On Mon, Aug 21, 2023 at 01:10:58AM +0000, Lu, Davina wrote: > > > [2] https://lore.kernel.org/r/53153bdf0cce4675b09bc2ee6483409f@xxxxxxxxxx > > Thanks for pointed out, I almost forget I did this version 2. How > to replicate this issue : CPU is X86_64, 64 cores, 2.50GHZ, MEM is > 256GB (it is VM though). Attached with one NVME device (no lvm, drbd > etc) with IOPS 64000 and 16GiB. I can also replicate with 10000 IOPS > 1000GiB NVME volume.... Thanks for the details. This is something that am interested in trying to potentially to merge, since for a sufficiently coversion-heavy workload (assuming the conversion is happening across multiple inodes, and not just a huge number of random writes into a single fallocated file), limiting the number of kernel threads to one CPU isn't always going to be the right thing. The reason why we had done this way was because at the time, the only choices that we had was between a single kernel thread, or spawning a kernel thread for every single CPU --- which for a very high-core-count system, consumed a huge amount of system resources. This is no longer the case with the new Concurrency Managed Workqueue (cmwq), but we never did the experiment to make sure cmwq didn't have surprising gotchas. > > Finally, I'm a bit nervous about setting the internal __WQ_ORDERED > > flag with max_active > 1. What was that all about, anyway? > > Yes, you are correct. I didn't use "__WQ_ORDERED" carefully, it > better not use with max_active > 1 . My purpose was try to guarantee > the work queue can be sequentially implemented on each core. I won't have time to look at this before the next merge window, but what I'm hoping to look at is your patch at [2], with two changes: a) Drop the _WQ_ORDERED flag, since it is an internal flag. b) Just pass in 0 for max_active instead of "num_active_cpus() > 1 ? num_active_cpus() : 1", for two reasons. Num_active_cpus() doesn't take into account CPU hotplugs (for example, if you have a dynmically adjustable VM shape where the number of active CPU's might change over time). Is there a reason why we need to set that limit? Do you see any potential problem with these changes? Thanks, - Ted