Re: Regression in workingset_refault latency on 5.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 23, 2022 at 9:16 PM Ivan Babrou <ivan@xxxxxxxxxxxxxx> wrote:
>
> Thanks for the info. Is it possible to test
> https://lore.kernel.org/all/20210929235936.2859271-1-shakeelb@xxxxxxxxxx/

We built the patch on top of 5.15.25 and deployed it with swap-on-zram on some
nodes. Test data follow:

1) Latency of workingset_refault over 60 seconds

$ sudo /usr/share/bcc/tools/funclatency -d 60 workingset_refault
Tracing 1 functions for "workingset_refault"... Hit Ctrl-C to end.

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 4643     |**********************                  |
       512 -> 1023       : 4259     |********************                    |
      1024 -> 2047       : 1178     |*****                                   |
      2048 -> 4095       : 4044     |*******************                     |
      4096 -> 8191       : 1804     |********                                |
      8192 -> 16383      : 109      |                                        |
     16384 -> 32767      : 52       |                                        |
     32768 -> 65535      : 45       |                                        |
     65536 -> 131071     : 15       |                                        |
    131072 -> 262143     : 2        |                                        |
    262144 -> 524287     : 0        |                                        |
    524288 -> 1048575    : 7        |                                        |
   1048576 -> 2097151    : 900      |****                                    |
   2097152 -> 4194303    : 8218     |****************************************|
   4194304 -> 8388607    : 967      |****                                    |
   8388608 -> 16777215   : 89       |                                        |

avg = 1229607 nsecs, total: 32378027676 nsecs, count: 26332

2) Latency of cgroup_rstat_flush_locked over 60 seconds

$ sudo /usr/share/bcc/tools/funclatency -d 60 cgroup_rstat_flush_locked
Tracing 1 functions for "cgroup_rstat_flush_locked"... Hit Ctrl-C to end.

     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 36       |                                        |
      2048 -> 4095       : 73       |                                        |
      4096 -> 8191       : 151      |                                        |
      8192 -> 16383      : 606      |**                                      |
     16384 -> 32767      : 387      |*                                       |
     32768 -> 65535      : 49       |                                        |
     65536 -> 131071     : 26       |                                        |
    131072 -> 262143     : 6        |                                        |
    262144 -> 524287     : 3        |                                        |
    524288 -> 1048575    : 14       |                                        |
   1048576 -> 2097151    : 2085     |*********                               |
   2097152 -> 4194303    : 8580     |****************************************|
   4194304 -> 8388607    : 772      |***                                     |
   8388608 -> 16777215   : 117      |                                        |
  16777216 -> 33554431   : 1        |                                        |

avg = 2592894 nsecs, total: 33463898782 nsecs, count: 12906

3) Summary of stack traces when mem_cgroup_flush_stats is over 5ms

@stackcount[
    mem_cgroup_flush_stats+1
    workingset_refault+296
    __read_swap_cache_async+512
    swap_cluster_readahead+440
    shmem_swapin+164
    shmem_swapin_page+396
    shmem_getpage_gfp+1228
    shmem_fault+104
    __do_fault+55
    __handle_mm_fault+3102
    handle_mm_fault+191
    do_user_addr_fault+428
    exc_page_fault+103
    asm_exc_page_fault+30
]: 1
@stackcount[
    mem_cgroup_flush_stats+1
    workingset_refault+296
    __read_swap_cache_async+512
    swap_cluster_readahead+333
    shmem_swapin+164
    shmem_swapin_page+396
    shmem_getpage_gfp+1228
    shmem_fault+104
    __do_fault+55
    __handle_mm_fault+3649
    handle_mm_fault+191
    do_user_addr_fault+428
    exc_page_fault+103
    asm_exc_page_fault+30
]: 1
@stackcount[
    mem_cgroup_flush_stats+1
    workingset_refault+296
    __read_swap_cache_async+512
    swap_cluster_readahead+333
    shmem_swapin+164
    shmem_swapin_page+396
    shmem_getpage_gfp+1228
    shmem_fault+104
    __do_fault+55
    __handle_mm_fault+3102
    handle_mm_fault+191
    do_user_addr_fault+428
    exc_page_fault+103
    asm_exc_page_fault+30
]: 4
@stackcount[
    mem_cgroup_flush_stats+1
    shrink_node+163
    balance_pgdat+791
    kswapd+508
    kthread+295
    ret_from_fork+34
]: 7
@stackcount[
    mem_cgroup_flush_stats+1
    workingset_refault+296
    add_to_page_cache_lru+159
    page_cache_ra_unbounded+340
    filemap_get_pages+569
    filemap_read+164
    xfs_file_buffered_read+152
    xfs_file_read_iter+106
    new_sync_read+280
    vfs_read+242
    __x64_sys_pread64+137
    __noinstr_text_start+59
    entry_SYSCALL_64_after_hwframe+68
]: 10
@stackcount[
    mem_cgroup_flush_stats+1
    workingset_refault+296
    add_to_page_cache_lru+159
    page_cache_ra_unbounded+340
    filemap_get_pages+233
    filemap_read+164
    xfs_file_buffered_read+152
    xfs_file_read_iter+106
    new_sync_read+280
    vfs_read+242
    __x64_sys_pread64+137
    __noinstr_text_start+59
    entry_SYSCALL_64_after_hwframe+68
]: 111
@stackcount[
    mem_cgroup_flush_stats+1
    workingset_refault+296
    add_to_page_cache_lru+159
    page_cache_ra_unbounded+340
    force_page_cache_ra+226
    filemap_get_pages+233
    filemap_read+164
    xfs_file_buffered_read+152
    xfs_file_read_iter+106
    new_sync_read+280
    vfs_read+242
    __x64_sys_pread64+137
    __noinstr_text_start+59
    entry_SYSCALL_64_after_hwframe+68
]: 225

This was collected with the following bpftrace script

kprobe:mem_cgroup_flush_stats
{
  @start[tid] = nsecs;
  @stack[tid] = kstack;
}

kretprobe:mem_cgroup_flush_stats
/@start[tid]/
{
  $usecs = (nsecs - @start[tid]) / 1000;
  if ($usecs >= 5000) {
    @stackcount[@stack[tid]]++;
  }
  delete(@start[tid]);
  delete(@stack[tid]);
}

interval:s:60
{
        print(@stackcount);
        exit();
}

>
> If that patch did not help then we either have to optimize rstat
> flushing or further increase the update buffer which is nr_cpus * 32.

I think overall, we don't see any significant improvements. The appearance of
expensive mem_cgroup_flush_stats is still there.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux