perf loss on parallel compile due to conention on the buf semaphore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have an ext4-based system where xfs got mounted on tmpfs for testing
purposes. The directory is being used a lot by gcc when compiling.

I'm testing with 24 compilers running in parallel, each operating on
their own hello world source file, listed at the end for reference.

Both ext4 and btrfs backing the directory result in 100% cpu
utilization and about 1500 compiles/second. With xfs I see about 20%
idle(!) and about 1100 compiles/second.

According to offcputime-bpfcc -K the time is spent waiting on the buf
thing, sample traces:

   finish_task_switch.isra.0
    __schedule
    schedule
    schedule_timeout
    __down_common
    down
    xfs_buf_lock
    xfs_buf_find_lock
    xfs_buf_get_map
    xfs_buf_read_map
    xfs_trans_read_buf_map
    xfs_read_agi
    xfs_ialloc_read_agi
    xfs_dialloc
    xfs_create
    xfs_generic_create
    path_openat
    do_filp_open
    do_sys_openat2
    __x64_sys_openat
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                cc (602142)
        10639

    finish_task_switch.isra.0
    __schedule
    schedule
    schedule_timeout
    __down_common
    down
    xfs_buf_lock
    xfs_buf_find_lock
    xfs_buf_get_map
    xfs_buf_read_map
    xfs_trans_read_buf_map
    xfs_read_agi
    xfs_iunlink
    xfs_dir_remove_child
    xfs_remove
    xfs_vn_unlink
    vfs_unlink
    do_unlinkat
    __x64_sys_unlink
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                as (598688)
        12050

The fact that this is contended aside, I'll note the stock semaphore
code does not do adaptive spinning, which avoidably significantly
worsens the impact. You can probably convert this to a rw semaphore
and only ever writelock, which should sort out this aspect. I did not
check what can be done to contend less to begin with.

reproducing:
create a hello world .c file (say /tmp/src.c) and plop into /src:
for i in $(seq 0 23); do cp /tmp/src.c /src/src${i}.c; done

plop the following into will-it-scale/tests/cc.c && ./cc_processes -t 24

#include <sys/types.h>
#include <unistd.h>

char *testcase_description = "compile";

void testcase(unsigned long long *iterations, unsigned long nr)
{
        char cmd[1024];

        sprintf(&cmd, "cc -c -o /tmp/out.%d /src/src%d.c", nr, nr);

        while (1) {
                system(cmd);

                (*iterations)++;
        }
}

-- 
Mateusz Guzik <mjguzik gmail.com>




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux