Too long is 120 seconds
The DB is in SSD devices. The devices are fast. The process OSD
reads about 800Mb but I cannot be sure from where.
On 13/06/18 11:36, Gregory Farnum
How long is “too long”? 800MB on an SSD should
only be a second or three.
I’m not sure if that’s a reasonable amount of
data; you could try compacting the rocksdb instance etc. But if
reading 800MB is noticeable I would start wondering about the
quality of your disks as a journal or rocksdb device.
I migrated
my OSDs from filestore to bluestore.
Each node now has 1 SSD with the OS and the BlockDBs and 3
HDDs with
bluestore data.
# lsblk
sdd 8:48 0 2.7T 0 disk
|-sdd2 8:50 0 2.7T 0 part
`-sdd1 8:49 0 100M 0 part /var/lib/ceph/osd/ceph-2
sdb 8:16 0 3.7T 0 disk
|-sdb2 8:18 0 3.7T 0 part
`-sdb1 8:17 0 100M 0 part /var/lib/ceph/osd/ceph-0
sdc 8:32 0 3.7T 0 disk
|-sdc2 8:34 0 3.7T 0 part
`-sdc1 8:33 0 100M 0 part /var/lib/ceph/osd/ceph-1
sda 8:0 0 223.6G 0 disk
|-sda4 8:4 0 1G 0 part
|-sda2 8:2 0 37.3G 0 part /
|-sda5 8:5 0 1G 0 part
|-sda3 8:3 0 1G 0 part
`-sda1 8:1 0 953M 0 part /boot/efi
Now the I/O works better, and I never saw again a slow
response (OSD not
MDS) warning.
But when I reboot a ceph node the OSDs takes too long to get
up. With
filestore it was almost inmediate.
Monitoring /proc/$(pidod ceph-osd)/io I could see that each
OSD reads
about 800 MBytes before getting up (My block.db partitions
are 1G).
Does the OSDs start re-process all the block.db when booting
There's any way to accelerate the OSD availability after a
ceph-users mailing list
ceph-users mailing list