Re: osds udev rules not triggered on reboot (jewel, jessie)

Loris Cuoghi <lc@xxxxxxxxxxxxxxxxx> · Wed, 27 Apr 2016 13:36:53 +0200

Hi Karsten,

I've had the same experience updating our test cluster (Debian 8) from 
Infernalis to Jewel.

I've update udev/systemd to the one in testing (so, from 215 to 229), 
and it worked much better at reboot.

So... Are the udev rules written for the udev version in RedHat (219) or 
greater versions ?

Thanks in advance :)

Le 27/04/2016 09:33, Karsten Heymann a écrit :
Hi!

the last days, I updated my jessie evaluation cluster to jewel and now
osds are not started automatically after reboot because they are not
mounted. This is the output of ceph-disk list after boot:

/dev/sdh :
  /dev/sdh1 ceph data, prepared, cluster ceph, osd.47, journal /dev/sde1
/dev/sdi :
  /dev/sdi1 ceph data, prepared, cluster ceph, osd.48, journal /dev/sde2
/dev/sdj :
  /dev/sdj1 ceph data, prepared, cluster ceph, osd.49, journal /dev/sde3

and so on.

systemd tried to start the units:

# systemctl | grep osd
● ceph-osd@47.service
                              loaded failed failed    Ceph object
storage daemon
● ceph-osd@48.service
                              loaded failed failed    Ceph object
storage daemon
● ceph-osd@49.service
                              loaded failed failed    Ceph object
storage daemon

# systemctl status ceph-osd@47.service
● ceph-osd@47.service - Ceph object storage daemon
    Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled)
    Active: failed (Result: start-limit) since Wed 2016-04-27 08:50:07
CEST; 21min ago
   Process: 3139 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
--id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
   Process: 2682 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
--cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
(code=exited, status=0/SUCCESS)
  Main PID: 3139 (code=exited, status=1/FAILURE)

Apr 27 08:50:06 ceph-cap1-02 systemd[1]: Unit ceph-osd@47.service
entered failed state.
Apr 27 08:50:07 ceph-cap1-02 systemd[1]: ceph-osd@47.service start
request repeated too quickly, refusing to start.
Apr 27 08:50:07 ceph-cap1-02 systemd[1]: Failed to start Ceph object
storage daemon.
Apr 27 08:50:07 ceph-cap1-02 systemd[1]: Unit ceph-osd@47.service
entered failed state.

Which is no suprise as the osd is not mounted:

# ls -l /var/lib/ceph/osd/ceph-47
total 0

The weird thing is running the following starts the osd:

# echo add > /sys/class/block/sdr1/uevents

so the udev rules to mount the osds seem to work.

Any ideas on how to debug this?

Best regards
Karsten
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com