Re: perf loss on parallel compile due to conention on the buf semaphore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 15, 2024 at 2:25 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> I have an ext4-based system where xfs got mounted on tmpfs for testing

erm, i mean on /tmp :)

I also used noatime.

> purposes. The directory is being used a lot by gcc when compiling.
>
> I'm testing with 24 compilers running in parallel, each operating on
> their own hello world source file, listed at the end for reference.
>
> Both ext4 and btrfs backing the directory result in 100% cpu
> utilization and about 1500 compiles/second. With xfs I see about 20%
> idle(!) and about 1100 compiles/second.
>
> According to offcputime-bpfcc -K the time is spent waiting on the buf
> thing, sample traces:
>
>    finish_task_switch.isra.0
>     __schedule
>     schedule
>     schedule_timeout
>     __down_common
>     down
>     xfs_buf_lock
>     xfs_buf_find_lock
>     xfs_buf_get_map
>     xfs_buf_read_map
>     xfs_trans_read_buf_map
>     xfs_read_agi
>     xfs_ialloc_read_agi
>     xfs_dialloc
>     xfs_create
>     xfs_generic_create
>     path_openat
>     do_filp_open
>     do_sys_openat2
>     __x64_sys_openat
>     do_syscall_64
>     entry_SYSCALL_64_after_hwframe
>     -                cc (602142)
>         10639
>
>     finish_task_switch.isra.0
>     __schedule
>     schedule
>     schedule_timeout
>     __down_common
>     down
>     xfs_buf_lock
>     xfs_buf_find_lock
>     xfs_buf_get_map
>     xfs_buf_read_map
>     xfs_trans_read_buf_map
>     xfs_read_agi
>     xfs_iunlink
>     xfs_dir_remove_child
>     xfs_remove
>     xfs_vn_unlink
>     vfs_unlink
>     do_unlinkat
>     __x64_sys_unlink
>     do_syscall_64
>     entry_SYSCALL_64_after_hwframe
>     -                as (598688)
>         12050
>
> The fact that this is contended aside, I'll note the stock semaphore
> code does not do adaptive spinning, which avoidably significantly
> worsens the impact. You can probably convert this to a rw semaphore
> and only ever writelock, which should sort out this aspect. I did not
> check what can be done to contend less to begin with.
>
> reproducing:
> create a hello world .c file (say /tmp/src.c) and plop into /src:
> for i in $(seq 0 23); do cp /tmp/src.c /src/src${i}.c; done
>
> plop the following into will-it-scale/tests/cc.c && ./cc_processes -t 24
>
> #include <sys/types.h>
> #include <unistd.h>
>
> char *testcase_description = "compile";
>
> void testcase(unsigned long long *iterations, unsigned long nr)
> {
>         char cmd[1024];
>
>         sprintf(&cmd, "cc -c -o /tmp/out.%d /src/src%d.c", nr, nr);
>
>         while (1) {
>                 system(cmd);
>
>                 (*iterations)++;
>         }
> }
>
> --
> Mateusz Guzik <mjguzik gmail.com>



-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux