Am 20.01.20 um 21:43 schrieb Yaarit Hatuka: > > > On Mon, Jan 20, 2020 at 9:47 AM Sage Weil <sweil@xxxxxxxxxx > <mailto:sweil@xxxxxxxxxx>> wrote: > > [Adding Sebastian, dev@xxxxxxx <mailto:dev@xxxxxxx>] > > Some things to improve with the OSD create path! > > On Mon, 20 Jan 2020, Yaarit Hatuka wrote: > > Here are a few Insights from this debugging process - I hope I got > it right: > > > > 1. Adding the device with "/dev/disk/by-id/...." did not work for > me, it > > failed in pybind/mgr/cephadm/module.py at: > > > https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1241 > > "if len(list(set(devices) & set(osd['devices']))) == 0" > > because osd['devices'] has the devices listed as "/dev/sdX", but > > set(devices) has them by their dev-id.... (which is the syntax > specified as > > the example in the docs, which I followed). > > It took me a couple of days to debug this :-) I'm really looking forward for Joshuas PR to get merged: > https://github.com/ceph/ceph/pull/32545/files#diff-b2eb7e15b64fdea42d7c8afdab01bcbcL1011-L1014 > > > > 2. I think that cephadm should be more verbose by default. When > creating > > OSD it only writes "Created osd(s) on host > 'mira027.front.sepia.ceph.com <http://mira027.front.sepia.ceph.com>'" > > (even in case creation failed...). It will help if it outputs the > different > > stages so that the user can see where it stopped in case of error. Might be an idea to also print out the ceph-volume command line that was executed. > > > > 3. ceph status shows that the OSD was added even if the > orchestrator failed > > to add it (but it's marked down and out). > > IIUC this is ceph-volumes failure path not cleaning up? Is this the > failure you saw when you passed the /dev/disk/by-id device path? > > > It seems like ceph-volume completed successfully all this time, but > since I always passed /dev/disk/by-id and not /dev/sdX to 'ceph > orchestrator osd create', this intersection was always empty: > set(devices) & set(osd['devices']) [1] > The other part of the condition was also true, so the 'continue' > happened all the time. > Therefore the orchestrator does not even try to: > self._create_daemon('osd',...) [2] > > Not sure why the OSD count is incremented, though. > > > 4. I couldn't find the logs that cephadm produces. > > I searched for them on both the source (mira010) and the target > (mira027) > > machines in /var/log/ceph/<fsid>/* and couldn't find any print > from either > > the cephadm mgr module nor the cephadm script. I also looked at > /var/log/*. > > Where are they hiding? > > The ceph-volume.log is the one to look at. > > > I looked at ceph-volume.log, but couldn't find any orchestrator / > cephadm module log messages there... like this one: > self.log.info <http://self.log.info>("create_mgr({}, mgr.{}): > starting".format(host, name)) [3] I found ``` fsid="$(ceph status --format json | jq -r .fsid)" for name in $(cephadm ls | jq -r '.[].name') ; do journalctl -u "ceph-$fsid@$name.service" > $name; done ``` to be sometimes helpful to gather the journald logs. > > > 5. After ceph-volume creates its LVs, the host's > > lvdisplay/vgdisplay/pvdisplay showed nothing. I had to run "pvscan > --cache" > > on the host in order for those commands to output the current > state. This > > may confuse the user. > > > > 6. I think it's also a good idea to have another cephadm feature > "cephadm > > shell --host=<host>" to open a cephadm shell on a remote host. I > wanted to > > run "ceph-volume lvm zap" on one of the remote hosts and to do > that I sshed > > over, copied the cephadm script and ran "cephadm shell". it would > be cool > > if we could do that from the original machine. > > The cephadm script doesn't know how to ssh. We could probably teach > it, > though, for something like this... but it might be simpler for the > user to just 'scp cephadm $host:', as that's basically what cephadm > would > do to "install" itself remotely? > > sage > > > [1] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1241 > [2] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1250 > [3] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1445 ; -- SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx