Re: Poor performance on a server-class system vs. desktop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dmitry,

On Thu, Nov 26, 2020 at 10:44 AM Dmitry Antipov <dmantipov@xxxxxxxxx> wrote:
BTW, did someone try to profile the brick process? I do, and got this
for the default replica 3 volume ('perf record -F 2500 -g -p [PID]'):

+    3.29%     0.02%  glfs_epoll001    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    3.17%     0.01%  glfs_epoll001    [kernel.kallsyms]      [k] do_syscall_64
+    3.17%     0.02%  glfs_epoll000    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    3.06%     0.02%  glfs_epoll000    [kernel.kallsyms]      [k] do_syscall_64
+    2.75%     0.01%  glfs_iotwr00f    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.74%     0.01%  glfs_iotwr00b    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.74%     0.01%  glfs_iotwr001    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.73%     0.00%  glfs_iotwr003    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.72%     0.00%  glfs_iotwr000    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.72%     0.01%  glfs_iotwr00c    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.70%     0.01%  glfs_iotwr003    [kernel.kallsyms]      [k] do_syscall_64
+    2.69%     0.00%  glfs_iotwr001    [kernel.kallsyms]      [k] do_syscall_64
+    2.69%     0.01%  glfs_iotwr008    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.68%     0.00%  glfs_iotwr00b    [kernel.kallsyms]      [k] do_syscall_64
+    2.68%     0.00%  glfs_iotwr00c    [kernel.kallsyms]      [k] do_syscall_64
+    2.68%     0.00%  glfs_iotwr00f    [kernel.kallsyms]      [k] do_syscall_64
+    2.68%     0.01%  glfs_iotwr000    [kernel.kallsyms]      [k] do_syscall_64
+    2.67%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.65%     0.00%  glfs_iotwr008    [kernel.kallsyms]      [k] do_syscall_64
+    2.64%     0.00%  glfs_iotwr00e    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.64%     0.01%  glfs_iotwr00d    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.63%     0.01%  glfs_iotwr00a    [kernel.kallsyms]      [k] do_syscall_64
+    2.63%     0.01%  glfs_iotwr007    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.63%     0.00%  glfs_iotwr005    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.63%     0.01%  glfs_iotwr006    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.63%     0.00%  glfs_iotwr009    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.61%     0.01%  glfs_iotwr004    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.61%     0.01%  glfs_iotwr00e    [kernel.kallsyms]      [k] do_syscall_64
+    2.60%     0.00%  glfs_iotwr006    [kernel.kallsyms]      [k] do_syscall_64
+    2.59%     0.00%  glfs_iotwr005    [kernel.kallsyms]      [k] do_syscall_64
+    2.59%     0.00%  glfs_iotwr00d    [kernel.kallsyms]      [k] do_syscall_64
+    2.58%     0.00%  glfs_iotwr002    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.58%     0.01%  glfs_iotwr007    [kernel.kallsyms]      [k] do_syscall_64
+    2.58%     0.00%  glfs_iotwr004    [kernel.kallsyms]      [k] do_syscall_64
+    2.57%     0.00%  glfs_iotwr009    [kernel.kallsyms]      [k] do_syscall_64
+    2.54%     0.00%  glfs_iotwr002    [kernel.kallsyms]      [k] do_syscall_64
+    1.65%     0.00%  glfs_epoll000    [unknown]              [k] 0x0000000000000001
+    1.65%     0.00%  glfs_epoll001    [unknown]              [k] 0x0000000000000001
+    1.48%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    1.44%     0.08%  glfs_rpcrqhnd    libpthread-2.32.so     [.] pthread_cond_wait@@GLIBC_2.3.2
+    1.40%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] do_syscall_64
+    1.36%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] __x64_sys_futex
+    1.35%     0.03%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] do_futex
+    1.34%     0.01%  glfs_iotwr00a    libpthread-2.32.so     [.] __libc_pwrite64
+    1.32%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k] __x64_sys_pwrite64
+    1.32%     0.00%  glfs_iotwr001    libpthread-2.32.so     [.] __libc_pwrite64
+    1.31%     0.01%  glfs_iotwr002    libpthread-2.32.so     [.] __libc_pwrite64
+    1.31%     0.00%  glfs_iotwr00b    libpthread-2.32.so     [.] __libc_pwrite64
+    1.31%     0.01%  glfs_iotwr00a    [kernel.kallsyms]      [k] vfs_write
+    1.30%     0.00%  glfs_iotwr001    [kernel.kallsyms]      [k] __x64_sys_pwrite64
+    1.30%     0.00%  glfs_iotwr008    libpthread-2.32.so     [.] __libc_pwrite64
+    1.30%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k] new_sync_write
+    1.30%     0.00%  glfs_iotwr00c    libpthread-2.32.so     [.] __libc_pwrite64
+    1.29%     0.00%  glfs_iotwr00a    [kernel.kallsyms]      [k] xfs_file_write_iter
+    1.29%     0.01%  glfs_iotwr00a    [kernel.kallsyms]      [k] xfs_file_dio_aio_write

And on replica 3 with storage.linux-aio enabled:

+   11.76%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+   11.42%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k] do_syscall_64
+    8.81%     0.00%  glfs_posixaio    [unknown]              [k] 0x00000000baadf00d
+    8.81%     0.00%  glfs_posixaio    [unknown]              [k] 0x0000000000000004
+    8.74%     0.06%  glfs_posixaio    libc-2.32.so           [.] __GI___writev
+    8.33%     0.02%  glfs_posixaio    [kernel.kallsyms]      [k] do_writev
+    8.23%     0.03%  glfs_posixaio    [kernel.kallsyms]      [k] vfs_writev
+    8.12%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k] do_iter_write
+    8.02%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k] do_iter_readv_writev
+    7.96%     0.04%  glfs_posixaio    [kernel.kallsyms]      [k] sock_write_iter
+    7.92%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k] sock_sendmsg
+    7.86%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k] tcp_sendmsg
+    7.28%     0.15%  glfs_posixaio    [kernel.kallsyms]      [k] tcp_sendmsg_locked
+    6.49%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k] __tcp_push_pending_frames
+    6.48%     0.10%  glfs_posixaio    [kernel.kallsyms]      [k] tcp_write_xmit
+    6.31%     0.02%  glfs_posixaio    [unknown]              [k] 0000000000000000
+    6.05%     0.13%  glfs_posixaio    [kernel.kallsyms]      [k] __tcp_transmit_skb
+    5.71%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k] __ip_queue_xmit
+    4.15%     0.03%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    4.07%     0.08%  glfs_posixaio    [kernel.kallsyms]      [k] ip_finish_output2
+    3.75%     0.02%  glfs_posixaio    [kernel.kallsyms]      [k] asm_call_sysvec_on_stack
+    3.75%     0.01%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] do_syscall_64
+    3.70%     0.03%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] __x64_sys_futex
+    3.68%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k] __local_bh_enable_ip
+    3.67%     0.07%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] do_futex
+    3.62%     0.05%  glfs_posixaio    [kernel.kallsyms]      [k] do_softirq
+    3.61%     0.01%  glfs_posixaio    [kernel.kallsyms]      [k] do_softirq_own_stack
+    3.59%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k] __softirqentry_text_start
+    3.44%     0.06%  glfs_posixaio    [kernel.kallsyms]      [k] net_rx_action
+    3.34%     0.04%  glfs_posixaio    [kernel.kallsyms]      [k] process_backlog
+    3.28%     0.02%  glfs_posixaio    [kernel.kallsyms]      [k] __netif_receive_skb_one_core
+    3.08%     0.02%  glfs_epoll000    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    3.02%     0.03%  glfs_epoll001    [kernel.kallsyms]      [k] entry_SYSCALL_64_after_hwframe
+    2.97%     0.01%  glfs_epoll000    [kernel.kallsyms]      [k] do_syscall_64
+    2.89%     0.01%  glfs_epoll001    [kernel.kallsyms]      [k] do_syscall_64
+    2.73%     0.08%  glfs_posixaio    [kernel.kallsyms]      [k] nf_hook_slow
+    2.25%     0.04%  glfs_posixaio    libc-2.32.so           [.] fgetxattr
+    2.16%     0.14%  glfs_rpcrqhnd    [kernel.kallsyms]      [k] futex_wake

According to these tables, the brick process is just a thin wrapper for the system calls
and kernel network subsystem behind them.

Mostly. However there's one issue that doesn't seem so obvious in the perf capture but we have identified it in other setups: when the system calls are processed very fast (as it should be the case when NVMe is used), the io-threads' thread pool will be constantly processing the request queue. This queue is currently synchronized with a mutex. The small latency per request makes the contention on the mutex quite high. This means that the thread pool tends to be serialized by the lock, which kills most of the parallelism and also causes a lot of additional system calls (increased CPU utilization and higher latencies).

For now the only way I know to try to minimize this effect is to reduce the number of threads in the io-threads pool. It's hard to tell what would be a good number. It depends on many things. But you can run some tests with different values to try to find the best one (after changing the number of threads, it's better to restart the volume).

Reducing the number of threads reduces the CPU power that gluster can use, but also reduces the contention, so it's expected (though not guaranteed) that at some point, even with fewer threads the performance could be a bit better.

Regards,

Xavi

 

To whom it may be interesting, the following replica 3 volume options:

performance.io-cache-pass-through: on
performance.iot-pass-through: on
performance.md-cache-pass-through: on
performance.nl-cache-pass-through: on
performance.open-behind-pass-through: on
performance.read-ahead-pass-through: on
performance.readdir-ahead-pass-through: on
performance.strict-o-direct: on
features.ctime: off
features.selinux: off
performance.write-behind: off
performance.open-behind: off
performance.quick-read: off
storage.linux-aio: on
storage.fips-mode-rchecksum: off

are likely to improve the I/O performance of GFAPI clients (fio with gfapi and gfapi_async
engines, qemu -drive file=gluster://XXX, etc.) by ~20%. But beware of killing I/O performance
of FUSE clients.

Dmitry
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux