Hi,
we are in the process of migrating our hosts to bluestore. Each host has
12 HDDs (6TB / 4TB) and two Intel P3700 NVME SSDs with 375 GB capacity.
The new bluestore OSDs are created by ceph-volume:
ceph-volume lvm create --bluestore --block.db /dev/nvmeXn1pY --data
/dev/sdX1
6 OSDs share a SSD with 30GB partitions for rocksdb; the remaining space
is used as additional ssd based osd without specifying additional
partitions.
Backfilling from the other nodes works fine for the hdd based OSDs, but
is _really_ slow for the ssd based ones. With filestore moving our
cephfs metadata pool around was a matter of 10 minutes (350MB, 8 million
objects, 1024 PGs). With bluestore remapped a part of the pool (about
400PGs, those affected by adding a new pair of ssd based OSDs) did not
finish over night....
OSD config section from ceph.conf:
[osd]
osd_scrub_sleep = 0.05
osd_journal_size = 10240
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 1
max_pg_per_osd_hard_ratio = 4.0
osd_max_pg_per_osd_hard_ratio = 4.0
bluestore_cache_size_hdd = 5368709120
mon_max_pg_per_osd = 400
Backfilling runs with max-backfills set to 20 during day and 50 during
night. Some numbers (ceph pg dump for the most advanced backfilling
cephfs metadata PG, ten seconds difference):
ceph pg dump | grep backfilling | grep -v undersized | sort -k4 -n -r |
tail -n 1 && sleep 10 && echo && ceph pg dump | grep backfilling | grep
-v undersized | sort -k4 -n -r | tail -n 1
dumped all
8.101 7581 0 0 4549 0 4194304
2488 2488 active+remapped+backfilling 2017-12-21 09:03:30.429605
543240'1012998 543248:1923733 [78,34,49]
78 [78,34,19] 78 522371'1009118 2017-12-18
16:11:29.755231 522371'1009118 2017-12-18 16:11:29.755231
dumped all
8.101 7580 0 0 4542 0 0
2489 2489 active+remapped+backfilling 2017-12-21 09:03:30.429605
543248'1012999 543250:1923755 [78,34,49]
78 [78,34,19] 78 522371'1009118 2017-12-18
16:11:29.755231 522371'1009118 2017-12-18 16:11:29.755231
Seven objects in 10 seconds does not sound sane to me, given that only
key-value has to be transferred.
Any hints how to tune this?
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com