Hello, >>> If these properties also mean device information (vendor, size, >>> solid/rotational, etc...) it could help >>> to better map/detect an OSD replacement since clusters tend to have a >>> certain level of >>> homogeneous hardware: if $brand, and $size, and $rotational etc... >>> >>>> >>>> - A daemon (e.g., ceph-osd-autoreplace) that runs on each machine or a >>>> tool that is triggered by udev. It would check for new, empty devices >>>> appearing in the locations (as defined by the by-path string) previously >>>> occupied by OSDs that are down. If that happens, it can use 'ceph osd >>>> safe-to-destroy' to verify whether it is safe to automatically rebuild >>>> that OSD. (If not, it might want to raise a health alert, since it's >>>> possible the drive that was physically pulled should be preserved until >>>> the cluster is sure it doesn't need it.) >>> >>> >>> systemd has some support for devices, so we might not even need a >>> daemon, but more a unit that can >>> depend on events already handled by systemd (would save us from udev). >> >> >> FreeBSD does not have systemd. 8-) >> >> I'm inclined to say luckily, but then that may be my personal bias. >> I don't like "automagic" tools like Udev or systemd tinkering with my disks. >> >> As Alan says, in ZFS one can designate hot-standby. But even there I prefer >> to be alerted and then manually intervene. > > Actually, I was talking about autoreplace by physical path. Hot > spares are something else. The physical path of a drive is distinct > from its device path. The physical path is determined by information > from a SES[1] expander, which can actually tell you which physical > slots contain which logical drives. > >> >> A hot-swap daemon that gets instructed to only use explicitly and fully >> enumerated disk might be something to trust. So something matching >> disk-serial number would be oke. > > Matching disk serial number isn't always safe in a VM. VMs can > generate duplicate serial numbers. Better to match against a GPT > label or something that identifies a drive as belonging to Ceph. > That, unfortunately, requires some intervention from the > administrator. The nice thing about a user space daemon is that its > behavior can easily be controlled by the sysadmin. So for example, a > sysadmin could opt into a rule that says "Ceph can take over all SCSI > disks" or "Ceph can take over all disks without an existing partition > table or known filesystem". I like the idea of mutiple level of "take over". To begin, we can imagine something like : there is a daemon which check if a new disk device is added to an OSD server, a quick analyses of the disk tell if there is a partition table, which one it is, if there is partitions, if there is zero or garbage (like encrypted data) etc... then make a resume of the new device to a mon or mgr or a new daemon if an OSD is marked as down/out, give the possibility to replace the osd mark as down by a single command (same place in the crush map same weight ceph osd <id> replaceby <new disk ref> if no OSD mark as down, offers the ability to add a new OSD at the right place in the crush map if possible (maybe calculate the place of the new disk can be difficult) ceph osd add <new disk ref> [crush map placement] of just ignore the disk ceph osd ignore <new disk ref> then give the ability of the cluster in some condition to do choose what to do without human interaction. does this make sense ? -- Yoann Moulin EPFL IC-IT -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html