ceph-mon rocksdb write latency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I am trouble shooting a issue that I am not really sure how to deal with. We have setup a ceph cluster version 16.2.6 with cephadm, running with podman containers.
Our hosts run ceph and kubernetes.
Our hosts run all NVMe, 512GB mem and a single AMD EPYC 7702P CPU.

We run baremetal and decided to run mds, mgr, mon, and rgw containers on our kubernetes controlplane nodes. Our kubernetes etcd monitoring was showing us that we have some problems regarding that it could not send hardbeat in time and latency on read-only requests.

This problem led me to dig deeper into our node performance and it relieved something, to me at least, new information. I installed the bcc tools and ran first biolatency and found that we see some pretty high write latency numbers.
disk = b'nvme0c0n1'
     usecs               : count     distribution
0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 1476 |****************************************| 32 -> 63 : 1072 |***************************** | 64 -> 127 : 543 |************** | 128 -> 255 : 261 |******* | 256 -> 511 : 261 |******* | 512 -> 1023 : 254 |****** | 1024 -> 2047 : 493 |************* | 2048 -> 4095 : 808 |********************* | 4096 -> 8191 : 244 |****** | 8192 -> 16383 : 42 |* | 16384 -> 32767 : 88 |** | 32768 -> 65535 : 167 |**** | 65536 -> 131071 : 313 |******** | 131072 -> 262143 : 463 |************ | 262144 -> 524287 : 77 |** |

While digging a little deeper with biosnoop I found that when we get the etcd errors, rocksdb is also writing, everytime it happens. TIME(s) COMM PID DISK T SECTOR BYTES QUE(ms) LAT(ms) 141.564655 rocksdb:low0 1493922 nvme0c0n1 W 84362696 131072 0.01 214.38 141.565270 rocksdb:low0 1493922 nvme0c0n1 W 84362952 131072 0.01 214.99 141.565898 rocksdb:low0 1493922 nvme0c0n1 W 84363208 131072 0.01 215.61 141.566424 rocksdb:low0 1493922 nvme0c0n1 W 84363464 131072 0.01 216.13 141.566931 rocksdb:low0 1493922 nvme0c0n1 W 84363720 131072 0.01 216.63 141.567488 rocksdb:low0 1493922 nvme0c0n1 W 84363976 131072 0.01 217.18 141.568097 rocksdb:low0 1493922 nvme0c0n1 W 84364232 131072 0.01 217.79 141.568707 rocksdb:low0 1493922 nvme0c0n1 W 84364488 131072 0.05 218.35 141.569332 rocksdb:low0 1493922 nvme0c0n1 W 84364744 131072 0.01 218.97 141.569881 rocksdb:low0 1493922 nvme0c0n1 W 84365000 131072 0.01 219.51 141.570372 rocksdb:low0 1493922 nvme0c0n1 W 84365256 131072 0.01 220.00 141.570893 rocksdb:low0 1493922 nvme0c0n1 W 84365512 131072 0.01 220.51 141.571462 rocksdb:low0 1493922 nvme0c0n1 W 84365768 131072 0.01 221.07 141.572079 rocksdb:low0 1493922 nvme0c0n1 W 84366024 131072 0.01 221.68 141.572726 rocksdb:low0 1493922 nvme0c0n1 W 84366280 131072 0.01 222.32 141.573394 rocksdb:low0 1493922 nvme0c0n1 W 84366536 131072 0.06 222.94 141.573936 rocksdb:low0 1493922 nvme0c0n1 W 84366792 131072 0.01 223.48 141.574497 rocksdb:low0 1493922 nvme0c0n1 W 84367048 131072 0.01 224.03 141.574634 rocksdb:low0 1493922 nvme0c0n1 W 84367304 28672 2.28 221.89 141.574773 rocksdb:low0 1493922 nvme0c0n1 W 84348104 28672 2.20 222.03 141.574909 rocksdb:low0 1493922 nvme0c0n1 W 84348168 28672 2.12 222.16 141.575050 rocksdb:low0 1493922 nvme0c0n1 W 84348232 28672 2.05 222.30 141.575194 rocksdb:low0 1493922 nvme0c0n1 W 84348296 28672 1.97 222.44 141.575337 rocksdb:low0 1493922 nvme0c0n1 W 84348360 28672 1.90 222.59 141.575478 rocksdb:low0 1493922 nvme0c0n1 W 84348424 28672 1.83 222.72 141.575621 rocksdb:low0 1493922 nvme0c0n1 W 84348488 28672 1.76 222.86 141.575769 rocksdb:low0 1493922 nvme0c0n1 W 84348552 28672 1.69 223.01 141.575919 rocksdb:low0 1493922 nvme0c0n1 W 84348616 28672 1.61 223.16 141.576049 rocksdb:low0 1493922 nvme0c0n1 W 84348680 28672 1.54 223.29 141.576168 rocksdb:low0 1493922 nvme0c0n1 W 84348744 28672 1.47 223.40 141.576289 rocksdb:low0 1493922 nvme0c0n1 W 84348808 28672 1.40 223.52 141.576415 rocksdb:low0 1493922 nvme0c0n1 W 84348872 28672 1.33 223.65 141.576542 rocksdb:low0 1493922 nvme0c0n1 W 84348936 28672 1.26 223.77 141.577018 rocksdb:low0 1493922 nvme0c0n1 W 84308992 131072 0.01 239.97 141.577520 rocksdb:low0 1493922 nvme0c0n1 W 84309248 131072 0.01 240.46 141.578133 rocksdb:low0 1493922 nvme0c0n1 W 84309504 131072 0.01 241.07 141.578790 rocksdb:low0 1493922 nvme0c0n1 W 84310016 131072 0.01 241.38 141.578801 etcd 1244092 nvme0c0n1 W 121233568 4096 0.01 213.58 141.579461 rocksdb:low0 1493922 nvme0c0n1 W 84326088 131072 0.01 241.14 141.579966 rocksdb:low0 1493922 nvme0c0n1 W 84326344 131072 0.01 241.64 141.580438 rocksdb:low0 1493922 nvme0c0n1 W 84326600 131072 0.06 242.06 141.580953 rocksdb:low0 1493922 nvme0c0n1 W 84326856 131072 0.01 242.56 141.581520 rocksdb:low0 1493922 nvme0c0n1 W 84327112 131072 0.01 243.12 141.582137 rocksdb:low0 1493922 nvme0c0n1 W 84327368 131072 0.02 243.73 141.582453 etcd 1244092 nvme0c0n1 W 648413832 4096 0.00 54.11 141.582726 rocksdb:low0 1493922 nvme0c0n1 W 84327624 131072 0.02 244.31 141.583246 rocksdb:low0 1493922 nvme0c0n1 W 84327880 131072 0.01 244.83 141.583741 rocksdb:low0 1493922 nvme0c0n1 W 84328136 131072 0.01 245.32 141.584309 rocksdb:low0 1493922 nvme0c0n1 W 84328392 131072 0.01 245.88 141.584601 etcd 1244092 nvme0c0n1 W 121233600 8192 0.01 219.37 141.584666 etcd 1244092 nvme0c0n1 W 121233736 4096 0.01 219.44 141.584918 rocksdb:low0 1493922 nvme0c0n1 W 84328648 131072 0.06 246.44 141.585555 rocksdb:low0 1493922 nvme0c0n1 W 84328904 131072 0.01 247.07 141.585757 rocksdb:low0 1493922 nvme0c0n1 W 84350024 28672 0.13 232.95 141.585911 rocksdb:low0 1493922 nvme0c0n1 W 84350088 28672 1.05 232.12 141.586073 rocksdb:low0 1493922 nvme0c0n1 W 84369416 32768 0.93 232.28 141.586201 rocksdb:low0 1493922 nvme0c0n1 W 84350152 28672 0.87 232.40 141.586329 rocksdb:low0 1493922 nvme0c0n1 W 84350216 28672 0.81 232.53 141.586458 rocksdb:low0 1493922 nvme0c0n1 W 84350280 28672 0.76 232.65 141.586591 rocksdb:low0 1493922 nvme0c0n1 W 84350344 28672 0.70 232.78 141.586711 rocksdb:low0 1493922 nvme0c0n1 W 84350408 28672 0.64 232.90 141.586832 rocksdb:low0 1493922 nvme0c0n1 W 84350472 28672 0.59 233.02 141.586958 rocksdb:low0 1493922 nvme0c0n1 W 84350536 28672 0.52 233.14 141.587086 rocksdb:low0 1493922 nvme0c0n1 W 84350600 28672 0.46 233.26 141.587215 rocksdb:low0 1493922 nvme0c0n1 W 84350664 28672 0.41 233.39 141.587339 rocksdb:low0 1493922 nvme0c0n1 W 84350728 28672 0.35 233.51 141.587466 rocksdb:low0 1493922 nvme0c0n1 W 84350792 28672 0.30 233.64 141.587596 rocksdb:low0 1493922 nvme0c0n1 W 84350856 28672 0.24 233.76 141.587729 rocksdb:low0 1493922 nvme0c0n1 W 84350920 28672 0.19 233.89 141.587867 rocksdb:low0 1493922 nvme0c0n1 W 84350984 28672 0.13 234.03 141.588003 rocksdb:low0 1493922 nvme0c0n1 W 84351048 28672 0.07 234.16 141.588073 etcd 1244092 nvme0c0n1 W 121233624 4096 0.01 222.85 141.588213 rocksdb:low0 1493922 nvme0c0n1 W 84351112 28672 0.02 214.04 141.588358 rocksdb:low0 1493922 nvme0c0n1 W 84351176 28672 0.01 174.87

Is it expected that rocksdb will create 200+ms write latency on NVMe drives?

I am not sure how to mitigate this.
* If I need to run my podman containers on a completely separate disk?
* Is there some setting in ceph that I missed that will remove this problem?

Any help to get a better understanding in this is very appreciated.
- Karsten
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux