On Sun, Sep 12, 2021 at 6:29 AM Feng Tang <feng.tang@xxxxxxxxx> wrote: > > On Sun, Sep 12, 2021 at 07:17:56PM +0800, Hillf Danton wrote: > [...] > > > +// if (!(__this_cpu_inc_return(stats_flush_threshold) % MEMCG_CHARGE_BATCH)) > > > + if (!(__this_cpu_inc_return(stats_flush_threshold) % 128)) > > > queue_work(system_unbound_wq, &stats_flush_work); > > > } > > > > Hi Feng, > > > > Would you please check if it helps fix the regression to avoid queuing a > > queued work by adding and checking an atomic counter. > > Hi Hillf, > > I just tested your patch, and it didn't recover the regression, but > just reduced it from -14% to around -13%, similar to the patch > increasing the batch charge number. > Thanks Hillf for taking a look and Feng for running the test. This shows that parallel calls to queue_work() is not the issue (there is already a test and set at the start of queue_work()) but the actual work done by queue_work() is costly for this code path. I wrote a simple anon page fault nohuge c program, profiled it and it seems like queue_work() is significant enough. - 51.00% do_anonymous_page + 16.68% alloc_pages_vma 11.61% _raw_spin_lock + 10.26% mem_cgroup_charge - 5.25% lru_cache_add_inactive_or_unevictable - 4.48% __pagevec_lru_add - 3.71% __pagevec_lru_add_fn - 1.74% __mod_lruvec_state - 1.60% __mod_memcg_lruvec_state - 1.35% queue_work_on - __queue_work - 0.93% wake_up_process - try_to_wake_up - 0.82% ttwu_queue 0.61% ttwu_do_activate - 2.97% page_add_new_anon_rmap - 2.68% __mod_lruvec_page_state - 2.48% __mod_memcg_lruvec_state - 1.67% queue_work_on - 1.53% __queue_work - 1.25% wake_up_process - try_to_wake_up - 0.94% ttwu_queue + 0.70% ttwu_do_activate 0.61% cgroup_rstat_updated 2.10% rcu_read_unlock_strict 1.40% cgroup_throttle_swaprate However when I switch the batch size to 128, it goes away. Feng, do you see any change with 128 batch size for aim7 benchmark?