Re: osds udev rules not triggered on reboot (jewel, jessie)

Karsten Heymann <karsten.heymann@xxxxxxxxx> · Wed, 27 Apr 2016 13:51:20 +0200

Hi Loris,

thank you for your feedback. As I plan to go productive with the
cluster later this year I'm really hesitant to update udev and systemd
to a version newer than jessie, especially as there is no official
backport for those packages yet. I really would expect ceph to work
out of the box with the version in jessie, so I'd rather try to find
the cause of the problem and help fixing it.

best regards

2016-04-27 13:36 GMT+02:00 Loris Cuoghi <lc@xxxxxxxxxxxxxxxxx>:
> Hi Karsten,
>
> I've had the same experience updating our test cluster (Debian 8) from
> Infernalis to Jewel.
>
> I've update udev/systemd to the one in testing (so, from 215 to 229), and it
> worked much better at reboot.
>
> So... Are the udev rules written for the udev version in RedHat (219) or
> greater versions ?
>
> Thanks in advance :)
>
>
> Le 27/04/2016 09:33, Karsten Heymann a écrit :
>>
>> Hi!
>>
>> the last days, I updated my jessie evaluation cluster to jewel and now
>> osds are not started automatically after reboot because they are not
>> mounted. This is the output of ceph-disk list after boot:
>>
>> /dev/sdh :
>>   /dev/sdh1 ceph data, prepared, cluster ceph, osd.47, journal /dev/sde1
>> /dev/sdi :
>>   /dev/sdi1 ceph data, prepared, cluster ceph, osd.48, journal /dev/sde2
>> /dev/sdj :
>>   /dev/sdj1 ceph data, prepared, cluster ceph, osd.49, journal /dev/sde3
>>
>> and so on.
>>
>> systemd tried to start the units:
>>
>> # systemctl | grep osd
>> ● ceph-osd@47.service
>>                               loaded failed failed    Ceph object
>> storage daemon
>> ● ceph-osd@48.service
>>                               loaded failed failed    Ceph object
>> storage daemon
>> ● ceph-osd@49.service
>>                               loaded failed failed    Ceph object
>> storage daemon
>>
>> # systemctl status ceph-osd@47.service
>> ● ceph-osd@47.service - Ceph object storage daemon
>>     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled)
>>     Active: failed (Result: start-limit) since Wed 2016-04-27 08:50:07
>> CEST; 21min ago
>>    Process: 3139 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
>> --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
>>    Process: 2682 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
>> --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
>> (code=exited, status=0/SUCCESS)
>>   Main PID: 3139 (code=exited, status=1/FAILURE)
>>
>> Apr 27 08:50:06 ceph-cap1-02 systemd[1]: Unit ceph-osd@47.service
>> entered failed state.
>> Apr 27 08:50:07 ceph-cap1-02 systemd[1]: ceph-osd@47.service start
>> request repeated too quickly, refusing to start.
>> Apr 27 08:50:07 ceph-cap1-02 systemd[1]: Failed to start Ceph object
>> storage daemon.
>> Apr 27 08:50:07 ceph-cap1-02 systemd[1]: Unit ceph-osd@47.service
>> entered failed state.
>>
>> Which is no suprise as the osd is not mounted:
>>
>> # ls -l /var/lib/ceph/osd/ceph-47
>> total 0
>>
>> The weird thing is running the following starts the osd:
>>
>> # echo add > /sys/class/block/sdr1/uevents
>>
>> so the udev rules to mount the osds seem to work.
>>
>> Any ideas on how to debug this?
>>
>> Best regards
>> Karsten
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com