Re: puzzling ceph with systemd boot sequence error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was misreading the error message.

* ERROR: unable to open OSD superblock on /var/lib/ceph/osd/XXXX-0: (2) No such file or directory

does not mean /var/lib/ceph/osd/XXXX-0 does not exist. It means files within /var/lib/ceph/osd/XXXX-0 do not exist. It would be clearer if the path of the missing file name was displayed instead of the directory [1].

Cheers

[1] https://github.com/ceph/ceph/blob/jewel/src/ceph_osd.cc#L411

On 29/11/2016 12:23, Loic Dachary wrote:
> Hi,
> 
> An error has been reported a few times [1] on RHEL 7.2 with jewel 10.2.3 (not exactly jewel 10.2.3 but the modified RedHat version of it, however I don't think the differences are significant in that context and it is likely that CentOS or Ubuntu 16.04 users run into the same problem, only not frequently enough to report it).
> 
> In a nutshell the partition on which ceph OSD are supposed to be mounted is not available when the OSD tries to start, at boot time. So the OSD tries a few times complaining the directory is not there and gives up. It could be a simple unit ordering problem if systemd did not have a way to order the units... but it does. And all units implicitly depends on local-fs.target which ensures they run after local file systems are mounted, even if it means waiting for LVM volumes to be available.
> 
> What really puzzles me is that despite /var being mounted, the OSD can't find the desired directory. The output of journalctl --all --this-boot --no-pager -o_verbose shows (redacted for brievety) /var is mounted before the OSD tries to start:
> 
> Fri 2016-11-25 08:23:03.653729 CET
> 
>     UNIT=var.mount
>     MESSAGE=Mounting /var...
> 
> Fri 2016-11-25 08:23:03.943381 CET
> 
>     _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
>     UNIT=var.mount
>     MESSAGE=Mounted /var.
> 
> Fri 2016-11-25 08:23:13.452598 CET
> 
>     UNIT=ceph-osd@0.service
>     MESSAGE=Starting Ceph object storage daemon...
> 
> Fri 2016-11-25 08:23:18.633909 CET
> 
>     _SYSTEMD_UNIT=ceph-osd@0.service
>     MESSAGE=2016-11-25 08:23:18.633504 7f251db78800 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/XXXX-0: (2) No such file or directory
>     _CMDLINE=/usr/bin/ceph-osd -f --cluster XXXX --id 0 --setuser ceph --setgroup ceph
> 
> If systemctl start ceph-osd@0.service is run manually later on, it works as expected. The failure is not permanent, it looks like a race.
> 
> I'll keep investigating and update the issue [1] as I make progress. If someone has a clue about what's going on, it would be good news. Even reports of similar problems would help, specially if they happen on other distributions. I'm hoping this is something simple that I don't see because I stared at the problem for too long :-)
> 
> Cheers
> 
> [1] http://tracker.ceph.com/issues/17889 OSD udev / systemd may race with lvm at boot time
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux