Okay, I adjusted rgw to register under gid like the others, and changed the cephadm logic around to cope. I also cleaned simplified the 'ceph -s' output: services: mon: 1 daemons, quorum a (age 15s) mgr: x(active, since 92s) osd: 1 osds: 1 up (since 71s), 1 in (since 88s) cephfs-mirror: 1 daemon active (1 hosts) rbd-mirror: 2 daemons active (1 hosts) rgw: 2 daemons active (1 hosts, 1 zones) - don't list individual daemon ids (won't scale for large clusters) - present any groupings we can identify (currently just distinct hosts and rgw zones; if there are reasonable groupings for cephfs,rbd-mirror or iscsi let's add those too) s On Thu, Mar 18, 2021 at 8:26 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > On Thu, Mar 18, 2021 at 9:00 PM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > > Hi everyone, > > > > The non-core daemon registrations in servicemap vs cephadm came up > > twice in the last couple of weeks: > > > > First, https://github.com/ceph/ceph/pull/40035 changed rgw to register > > as rgw.$id.$gid and made cephadm complain about stray unmanaged > > daemons. The motivation was that the PR allows multiple radosgw > > daemons to share the same auth name + key and still show up in the > > servicemap. > > > > Then, today, I noticed that cephfs-mirror caused the same cephadm > > error because was registering as cephfs-mirror.$gid instead of the > > cephfs-mirror.$id that cephadm expected. I went to fix that in > > cephfs-mirror, but noticed that the behavior was copied from > > rbd-mirror.. which wasn't causing any cephadm error. It turns out > > that cephadm has some special code from rbd-mirror to identify daemons > > in the servicemap: > > > > https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L412-L420 > > > > So to fix cephfs-mirror, I opted to keep the existing behavior and > > adjust cephadm: > > > > https://github.com/ceph/ceph/pull/40220/commits/30d87f3746ff9daf219366354f24c0d8e306844a > > > > For now, at least, that solves the problem. But, as things stand rgw > > and {cephfs,rbd}-mirror are behaving a bit differently with > > servicemap. The registrations look like so: > > > > { > > "epoch": 538, > > "modified": "2021-03-18T17:28:12.500356-0400", > > "services": { > > "cephfs-mirror": { > > "daemons": { > > "summary": "", > > "4220": { > > "start_epoch": 501, > > "start_stamp": "2021-03-18T12:49:32.929888-0400", > > "gid": 4220, > > "addr": "10.3.64.25:0/3521332238", > > "metadata": { > > ... > > "id": "dael.csfspq", > > "instance_id": "4220", > > ... > > }, > > "task_status": {} > > } > > } > > }, > > "rbd-mirror": { > > "daemons": { > > "summary": "", > > "4272": { > > "start_epoch": 531, > > "start_stamp": "2021-03-18T16:31:26.540108-0400", > > "gid": 4272, > > "addr": "10.3.64.25:0/2576541551", > > "metadata": { > > ... > > "id": "dael.kfenmm", > > "instance_id": "4272", > > ... > > }, > > "task_status": {} > > }, > > "4299": { > > "start_epoch": 534, > > "start_stamp": "2021-03-18T16:52:59.027580-0400", > > "gid": 4299, > > "addr": "10.3.64.25:0/600966616", > > "metadata": { > > ... > > "id": "dael.yfhmmq", > > "instance_id": "4299", > > ... > > }, > > "task_status": {} > > } > > } > > }, > > "rgw": { > > "daemons": { > > "summary": "", > > "foo.dael.hwyogi": { > > "start_epoch": 537, > > "start_stamp": "2021-03-18T17:27:58.998535-0400", > > "gid": 4319, > > "addr": "10.3.64.25:0/3084463187", > > "metadata": { > > ... > > "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", > > "zone_name": "default", > > "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", > > "zonegroup_name": "default" > > }, > > "task_status": {} > > }, > > "foo.dael.pyvurh": { > > "start_epoch": 537, > > "start_stamp": "2021-03-18T17:27:58.999620-0400", > > "gid": 4318, > > "addr": "10.3.64.25:0/2303221705", > > "metadata": { > > ... > > "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", > > "zone_name": "default", > > "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", > > "zonegroup_name": "default" > > }, > > "task_status": {} > > }, > > "foo.dael.rqipjp": { > > "start_epoch": 538, > > "start_stamp": "2021-03-18T17:28:10.866327-0400", > > "gid": 4330, > > "addr": "10.3.64.25:0/4039152887", > > "metadata": { > > ... > > "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", > > "zone_name": "default", > > "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", > > "zonegroup_name": "default" > > }, > > "task_status": {} > > } > > } > > } > > } > > } > > > > With the *-mirror approach, the servicemap "key" is always the gid, > > and you have to look at the "id" to see how the daemon is > > named/authenticated. With rgw, the name is the key and there is no > > "id" key. > > > > I'm inclined to just go with the gid-as-key for rgw too and add the > > "id" key so that we are behaving consistently. This would have the > > side-effect of also solving the original goal of allowing many rgw > > daemons to share the same auth identity and still show up in the > > servicemap. > > Just wanted to throw another variation in this model while we are > talking about it. tcmu-runner for the Ceph iSCSI gateway registers as > "<node-name>:<pool-name>/<image-name>" [1]. It's implementation > predates all of these other ones. > > > The downside is that interpreting the service for the running daemons > > is a bit more work. For example, currently ceph -s shows > > > > services: > > mon: 1 daemons, quorum a (age 2d) > > mgr: x(active, since 58m) > > osd: 1 osds: 1 up (since 2d), 1 in (since 2d) > > cephfs-mirror: 1 daemon active (4220) > > rbd-mirror: 2 daemons active (4272, 4299) > > rgw: 2 daemons active (foo.dael.rqipjp, foo.dael.sajkvh) > > > > Showing the gids there is clearly now what we want. But similarly > > showing the daemon names is probably also a bad idea since it won't > > scale beyond ~3 or so; we probably just want a simple count. > > tcmu-runner really hit this scaling issue and Xiubo just added the > ability to programatically fold these together via optional > "daemon_type" and "daemon_prefix" metadata values [2][3] so that "ceph > -s" will show something like: > > ... snip ... > tcmu-runner: 3 portals active (gateway0, gateway1, gateway2) > ... snip ... > > > Reasonable? > > sage > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > [1] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L190 > [2] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L202 > [3] https://github.com/ceph/ceph/blob/master/src/mgr/ServiceMap.cc#L83 > > -- > Jason > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx