threading requirements for librbd

Blair Bethwaite <blair.bethwaite@xxxxxxxxx> · Tue, 8 Mar 2016 23:32:01 +1100

Hi all,

Not getting very far with this query internally (RH), so hoping
someone familiar with the code can spare me the C++ pain...

We've hit soft thread count ulimits a couple of times with different
Ceph clusters. The clients (Qemu/KVM guests on both Ubuntu and RHEL
hosts) have hit the limit thanks to many socket fds to the Ceph
cluster and then experienced weird (at least the first time) and
difficult to debug (no qemu or libvirt logs) issues. The primary
symptom seems to be an apparent IO hang in the guest with no
well-defined trigger, i.e., the Ceph volumes seem to work initially
but then somehow we hit the ulimit and no further guest IOs progress
(iostat shows devices at 100% util but no IOPS).

qemu.conf has a max_files setting for tuning the relevant system
default ulimit on guests, but we've no idea what it needs to be (so
for now have just gone very large).

So, how many threads does librbd need? It seems to be relative to the
size (#OSDs and/or #PGs) of the cluster, as in one case this issue
popped up for a user with 10 RBD volumes attached to an OpenStack
instance only after we added a handful of OSDs to expand the cluster
(which pushed their qemu/kvm processes steady state fd usage from ~900
to ~1100, past the 1024 default).

-- 
Cheers,
~Blairo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com