Re: Regression in workingset_refault latency on 5.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 23, 2022 at 4:00 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:

> > Can you share a bit more detail on your hardware configuration (num of
> > cpus) and if possible the flamegraph?
> >

We have a mix of 96 and 128 cpus. I'm not yet sure if it's possible to share
the flamegraphs. We may have to come back to that later if necessary.

>
> Also if you can reproduce the issue, can you try the patch at
> https://lore.kernel.org/all/20210929235936.2859271-1-shakeelb@xxxxxxxxxx/
> ?

We can give it a try. I also wrote a bpftrace script to get the kernel
stack when we
encounter slow mem_cgroup_flush_stats ( with 10ms as threshold )

kprobe:mem_cgroup_flush_stats
{
  @start[tid] = nsecs;
  @stack[tid] = kstack;
}

kretprobe:mem_cgroup_flush_stats
/@start[tid]/
{
  $usecs = (nsecs - @start[tid]) / 1000;
  if ($usecs >= 10000) {
    printf("mem_cgroup_flush_stats: %d us\n", $usecs);
    printf("stack: %s\n", @stack[tid]);
  }
  delete(@start[tid]);
  delete(@stack[tid]);
}

END
{
        clear(@start);
        clear(@stack);
}

Running it on a production node yields output like

mem_cgroup_flush_stats: 10697 us
stack:
        mem_cgroup_flush_stats+1
        workingset_refault+296
        add_to_page_cache_lru+159
        page_cache_ra_unbounded+340
        force_page_cache_ra+226
        filemap_get_pages+233
        filemap_read+164
        xfs_file_buffered_read+152
        xfs_file_read_iter+106
        new_sync_read+277
        vfs_read+242
        __x64_sys_pread64+137
        do_syscall_64+56
        entry_SYSCALL_64_after_hwframe+68

I think the addition of many milliseconds on workingset_refault is too high.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux