Hello everybody,
I have issues with very slow requests a simple tree node cluster here,
four WDC enterprise disks and Intel Optane NVMe journal on identical
high memory nodes, with 10GB networking.
It was working all good with Ceph Hammer on Debian Wheezy, but I wanted
to upgrade to a supported version and test out bluestore as well. So I
upgraded to luminous on Debian Stretch and used ceph-volume to create
bluestore osds, everything went downhill from there.
I went back to filestore on all nodes but I still have slow requests and
I can not pinpoint a good reason I tried to debug and gathered
information to look at:
https://paste.debian.net/hidden/acc5d204/
First I thought it was the balancing that was making things slow, then I
thought it might be the LVM layer, so I recreated the nodes without LVM
by switching from ceph-volume to ceph-disk, no different still slow
request. Then I changed back from bluestore to filestore but still the a
very slow cluster. Then I thought it was a CPU scheduling issue and
downgraded the 5.x kernel and CPU performance is full speed again. I
thought maybe there is something weird with an osd and taking them out
one by one, but slow request are still showing up and client performance
from vms is really poor.
I just feel a burst of small requests keeps blocking for a while then
recovers again.
Many thanks for helping out looking at the URL.
If there are options which I should tune for a hdd with nvme journal
setup please share.
Jelle
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com