On Wed, 29 Jul 2015, Alex Elsayed wrote: > Sage Weil wrote: > > > On Wed, 29 Jul 2015, Alex Elsayed wrote: > <snip for gmane> > >> My thinking is more that the "osd data = " key makes a lot less sense in > >> the systemd world overall - passing the OSD the full path on the > >> commandline via some --datadir would mean you could trivially use > >> systemd's instance templating, and just do > >> > >> ExecStart=/usr/bin/ceph-osd -f --datadir=/var/lib/ceph/osd/%i > >> > >> and be done with it. Could even do RequiresMountsFor=/var/lib/ceph/osd/%i > >> too, which would order it after (and make it depend on) any systemd.mount > >> units for that path. > > > > Note that there is a 1:1 equivalence between command line options and > > config options, so osd data = /foo and --osd-data foo are the same thing. > > Not that I think that matters here--although it's possible to manually > > specify paths in ceph.conf users can't do that if they want the udev magic > > to work (that's already true today, without systemd). > > Sure, though my thought was that the udev magic would work more sanely _via_ > this. The missing part is loading the cluster and ID from the OSD data dir. > > > In any case, though, if your %i above is supposed to be the uuid, that's > > much less friendly than what we have now, where users can do > > > > systemctl stop ceph-osd@12 > > > > to stop osd.12. > > > > I'm not sure it's worth giving up the bind mount complexity unless it > > really becomes painful to support, given how much nicer the admin > > experience is... > > Well, that does presuppose that they've either SSHed into the machine > manually, or are using systemctl -H to do so via systemctl. That's already > not an especially nice user experience, since they need to manually consider > the cluster's structure. > > Something more like 'ceph tell osd.N die' or similar could work, and > SuccessExitStatus= could be used to make it even nicer (that even if it > gives a different exit status for "die" as opposed to other successes, > systemd can say "any of these exit codes are okay, don't autorestart") > > However, neither of those handles unmounting, and it still doesn't handle > starting. All of the above are still partial solutions; hopefully iteration > can result in something better in all ways. > > Also, note that if RequiresMountsFor= is used, unmounting the filesystem - > by device or by mountpoint - will stop the unit due to proper dependency > handling. (If RMF doesn't, BindsTo does - BindsTo will additionally do so if > the device is unmounted or suddenly unplugged without systemd intervention) > > systemctl stop dev-sdc.device # all OSDs running off of sdc stop > systemctl stop dev-sdd1.device # Just one partition this time > > Nice and tidy. So, it seems like plan B would be something like: - mounts on /var/lib/ceph/osd/data/$uuid. For new backends that have multiple mounts (newstore likely will), we may also have something like /var/lib/ceph/osd/data-fast/$uuid as an SSD partition or something. - systemd ceph-osd@$uuid task runs ceph-osd --cluster ceph --id 123 --osd-uuid $uuid - simpler udev rules - simpler ceph-disk behavior - The 'one cluster per host' restriction would go away. This is currently there because we only have a single systemd parameter for the @ services and we're using the osd id (which is not unique across clusters). The uuid would be, so that's a win. But, - admin can't tell from 'systemctl | grep ceph' or from 'df' or 'mount' which OSD is which, but they could from 'ps ax | grep ceph-osd'. - stopping an individual osd would be done by $uuid instead of osd id: systemctl stop ceph-osd@66f354f2-752e-409f-8194-be05f6b071d9 For an admin this is probably a cut&paste from ps ax output? - we could perhaps make a 'ceph-disk stop' and 'ceph-disk umount' commands to make this a bit simpler? What do people think? I like simple, but I don't want to make life too hard on the admin. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html