Ceph RBD - High IOWait during the Writes

athreyavc@xxxxxxxxx · Tue, 10 Nov 2020 15:23:58 -0000

Hi, 

We have recently deployed a Ceph cluster with 

12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8

We are using Ceph Octopus and we are using RBD block devices.

We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8) across which RBDs are mapped and mounted, 25 RBDs each on each client node. Each RBD size is 10TB. Each RBD is formatted as EXT4 file system. 

>From network side, we have 10Gbps Active/Passive Bond on all the Ceph cluster nodes, including the clients. Jumbo frames enabled  and MTU is 9000

This is a new cluster and cluster health reports Ok. But we see high IO wait during the writes. 

>From one of the clients, 

15:14:30        CPU     %user     %nice   %system   %iowait    %steal     %idle
15:14:31        all      0.06      0.00      1.00     45.03      0.00     53.91
15:14:32        all      0.06      0.00      0.94     41.28      0.00     57.72
15:14:33        all      0.06      0.00      1.25     45.78      0.00     52.91
15:14:34        all      0.00      0.00      1.06     40.07      0.00     58.86
15:14:35        all      0.19      0.00      1.38     41.04      0.00     57.39
Average:        all      0.08      0.00      1.13     42.64      0.00     56.16

and the system load shows very high 

top - 15:19:15 up 34 days, 41 min,  2 users,  load average: 13.49, 13.62, 13.83

>From 'atop' 

one of the CPUs shows this 

CPU | sys	7%  | user	1% |  irq	2% |  idle   1394% | wait    195%  | steal     0% |  guest     0% | ipc  initial  | cycl initial  | curf  806MHz |  curscal   ?%

On the OSD nodes, don't see much %utilization of the disks. 

RBD caching values are default. 

Are we overlooking some configuration item ?

Thanks and Regards,

At
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx