thread bstore_kv_sync - high disk utilization

Benjamin Zapiec <zapiec@xxxxxxxxxx> · Fri, 22 Feb 2019 14:23:38 +0100

Hello,

I couldn't find anything satisfying that could
clearly describe what this thread does.

And if the average IO wait for the block device
(ca. 60%) is normal on a SSD device. Even though
when there is no/not much client workload.

Output from iotop:

---

9890 be/4 ceph        0.00 B/s  817.11 K/s  0.00 % 71.91 % ceph-osd -f
--cluster ceph --id~--setgroup ceph [bstore_kv_sync]

---

I'm using ceph block storage to connect with an Openstack
environment. The ceph cluster consists of three
"identical" Machines driven by 12 OSDs:

ID CLASS WEIGHT   TYPE NAME         STATUS REWEIGHT PRI-AFF
-1       41.62500 root default
-7       13.87500     host storage1
 7   hdd  5.26299         osd.7         up  1.00000 1.00000 (6TB)
 9   hdd  5.45799         osd.9         up  1.00000 1.00000 (6TB)
 2   ssd  1.81898         osd.2         up  1.00000 1.00000 (2TB)
 4   ssd  1.33499         osd.4         up  1.00000 1.00000 (2TB)
-3       13.87500     host storage2
 8   hdd  5.26299         osd.8         up  1.00000 1.00000 (6TB)
10   hdd  5.45799         osd.10        up  1.00000 1.00000 (6TB)
 1   ssd  1.81898         osd.1         up  1.00000 1.00000 (2TB)
 5   ssd  1.33499         osd.5         up  1.00000 1.00000 (2TB)
-5       13.87500     host storage3
 6   hdd  5.26299         osd.6         up  1.00000 1.00000 (6TB)
11   hdd  5.45799         osd.11        up  1.00000 1.00000 (6TB)
 0   ssd  1.81898         osd.0         up  1.00000 1.00000 (2TB)
 3   ssd  1.33499         osd.3         up  1.00000 1.00000 (2TB)

The network configuration looks like this:

2x 10Gbit(eth1/eth2) -> Bond0 -> cluster network/backend
2x 10Gbit(eth3/eth4) -> Bond1 -> public network/mons and openstack

The virtual machines are claimed to have poor performance
so i started investigation. Using "top" i saw IO wait was very high
(above wa: 50.0).
To be sure that the load on the cluster is as low as
possible i was doing some research in the evening when all
of the VMs were idle (but still up and running).
And the DISK utilization for the first SSD OSD on
all hosts didn't sink as I would expect it to do.

Even with "ceph -s" telling me that there are just <2Mbit/s RW
and < 1k IOPS client traffic the utilization on the SSD was
very high.

For performance reasons i have created a replicated_rule
just using SSDs for the vms pool. And to save some space
i reduced replication from 3 to 2.

But without the rule i could see the same behaviour. The
only different was that it wasn't utilising the SSD but
the HDD instead.

One possible interesting thing is that the monitors run
next to the OSDs on the same hosts.

In general the OSDs on SSD don't use db.wal or db.block.
The OSDs on HDD do have a separate db.block partition on
SSD (250GB - but db.block just contains about 5GB good question
why this is not used ;-) ).

Any suggestions on the high disk utilization by bstore_kv_sync?

Best regards

-- 
Benjamin Zapiec <benjamin.zapiec@xxxxxxxxxx> (System Engineer)
* GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
* Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
* http://www.GONICUS.de

* Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
* Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
* Vorsitzender des Beirats: Juergen Michels
* Amtsgericht Arnsberg * HRB 1968

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com