Re: OSDs too slow to start

Alfredo Deza <adeza@xxxxxxxxxx> · Mon, 18 Jun 2018 08:09:43 -0400

On Fri, Jun 15, 2018 at 11:59 AM, Alfredo Daniel Rezinovsky
<alfredo.rezinovsky@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Too long is 120 seconds
>
> The DB is in SSD devices. The devices are fast. The process OSD reads about
> 800Mb but I cannot be sure from where.

You didn't mention what version of Ceph you are using and how you
deployed these OSDs (ceph-disk or ceph-volume?)

I'm assuming ceph-disk here because of the many seconds that it takes
to boot the OSDs, ceph-volume doesn't have this problem as it uses a
streamlined
process to bring up the OSDs as the devices become available to the OS at boot.

If you are using ceph-volume, please send us both logs
(/var/log/ceph/ceph-volume.log, and
/var/log/ceph/ceph-volume-systemd.log)

Otherwise this is a well known behavior.

>
>
> On 13/06/18 11:36, Gregory Farnum wrote:
>
> How long is “too long”? 800MB on an SSD should only be a second or three.
> I’m not sure if that’s a reasonable amount of data; you could try compacting
> the rocksdb instance etc. But if reading 800MB is noticeable I would start
> wondering about the quality of your disks as a journal or rocksdb device.
> -Greg
>
> On Tue, Jun 12, 2018 at 2:23 PM Alfredo Daniel Rezinovsky
> <alfredo.rezinovsky@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> I migrated my OSDs from filestore to bluestore.
>>
>> Each node now has 1 SSD with the OS and the BlockDBs and 3 HDDs with
>> bluestore data.
>>
>> # lsblk
>> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> sdd      8:48   0   2.7T  0 disk
>> |-sdd2   8:50   0   2.7T  0 part
>> `-sdd1   8:49   0   100M  0 part /var/lib/ceph/osd/ceph-2
>> sdb      8:16   0   3.7T  0 disk
>> |-sdb2   8:18   0   3.7T  0 part
>> `-sdb1   8:17   0   100M  0 part /var/lib/ceph/osd/ceph-0
>> sdc      8:32   0   3.7T  0 disk
>> |-sdc2   8:34   0   3.7T  0 part
>> `-sdc1   8:33   0   100M  0 part /var/lib/ceph/osd/ceph-1
>> sda      8:0    0 223.6G  0 disk
>> |-sda4   8:4    0     1G  0 part
>> |-sda2   8:2    0  37.3G  0 part /
>> |-sda5   8:5    0     1G  0 part
>> |-sda3   8:3    0     1G  0 part
>> `-sda1   8:1    0   953M  0 part /boot/efi
>>
>> Now the I/O works better, and I never saw again a slow response (OSD not
>> MDS) warning.
>>
>> But when I reboot a ceph node the OSDs takes too long to get up. With
>> filestore it was almost inmediate.
>>
>> Monitoring /proc/$(pidod ceph-osd)/io I could see that each OSD reads
>> about 800 MBytes before getting up (My block.db partitions are 1G).
>>
>> Does the OSDs start re-process all the block.db when booting up?
>>
>> There's any way to accelerate the OSD availability after a reboot?
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com