Slow backfilling with bluestore, ssd and metadata pools

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


we are in the process of migrating our hosts to bluestore. Each host has 12 HDDs (6TB / 4TB) and two Intel P3700 NVME SSDs with 375 GB capacity. The new bluestore OSDs are created by ceph-volume:


ceph-volume lvm create --bluestore --block.db /dev/nvmeXn1pY --data /dev/sdX1


6 OSDs share a SSD with 30GB partitions for rocksdb; the remaining space is used as additional ssd based osd without specifying additional partitions.


Backfilling from the other nodes works fine for the hdd based OSDs, but is _really_ slow for the ssd based ones. With filestore moving our cephfs metadata pool around was a matter of 10 minutes (350MB, 8 million objects, 1024 PGs). With bluestore remapped a part of the pool (about 400PGs, those affected by adding a new pair of ssd based OSDs) did not finish over night....


OSD config section from ceph.conf:

[osd]
osd_scrub_sleep = 0.05
osd_journal_size = 10240
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 1
max_pg_per_osd_hard_ratio = 4.0
osd_max_pg_per_osd_hard_ratio = 4.0
bluestore_cache_size_hdd = 5368709120
mon_max_pg_per_osd = 400


Backfilling runs with max-backfills set to 20 during day and 50 during night. Some numbers (ceph pg dump for the most advanced backfilling cephfs metadata PG, ten seconds difference):


ceph pg dump | grep backfilling | grep -v undersized | sort -k4 -n -r | tail -n 1 && sleep 10 && echo && ceph pg dump | grep backfilling | grep -v undersized | sort -k4 -n -r | tail -n 1
dumped all
8.101      7581                  0        0      4549       0 4194304 2488     2488 active+remapped+backfilling 2017-12-21 09:03:30.429605 543240'1012998    543248:1923733 [78,34,49]         78                     [78,34,19] 78    522371'1009118 2017-12-18 16:11:29.755231    522371'1009118 2017-12-18 16:11:29.755231

dumped all
8.101      7580                  0        0      4542 0           0 2489     2489 active+remapped+backfilling 2017-12-21 09:03:30.429605 543248'1012999    543250:1923755 [78,34,49]         78                     [78,34,19] 78    522371'1009118 2017-12-18 16:11:29.755231    522371'1009118 2017-12-18 16:11:29.755231


Seven objects in 10 seconds does not sound sane to me, given that only key-value has to be transferred.


Any hints how to tune this?


Regards,

Burkhard


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux