On Fri, Mar 19, 2021 at 9:50 AM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > Okay, I adjusted rgw to register under gid like the others, and > changed the cephadm logic around to cope. thanks! i was hoping for a solution on the ceph-mgr side in discussions on https://github.com/ceph/ceph/pull/40035 > > I also cleaned simplified the 'ceph -s' output: > > services: > mon: 1 daemons, quorum a (age 15s) > mgr: x(active, since 92s) > osd: 1 osds: 1 up (since 71s), 1 in (since 88s) > cephfs-mirror: 1 daemon active (1 hosts) > rbd-mirror: 2 daemons active (1 hosts) > rgw: 2 daemons active (1 hosts, 1 zones) > > - don't list individual daemon ids (won't scale for large clusters) > - present any groupings we can identify (currently just distinct hosts > and rgw zones; if there are reasonable groupings for cephfs,rbd-mirror > or iscsi let's add those too) > > s > > On Thu, Mar 18, 2021 at 8:26 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > On Thu, Mar 18, 2021 at 9:00 PM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > > > > Hi everyone, > > > > > > The non-core daemon registrations in servicemap vs cephadm came up > > > twice in the last couple of weeks: > > > > > > First, https://github.com/ceph/ceph/pull/40035 changed rgw to register > > > as rgw.$id.$gid and made cephadm complain about stray unmanaged > > > daemons. The motivation was that the PR allows multiple radosgw > > > daemons to share the same auth name + key and still show up in the > > > servicemap. > > > > > > Then, today, I noticed that cephfs-mirror caused the same cephadm > > > error because was registering as cephfs-mirror.$gid instead of the > > > cephfs-mirror.$id that cephadm expected. I went to fix that in > > > cephfs-mirror, but noticed that the behavior was copied from > > > rbd-mirror.. which wasn't causing any cephadm error. It turns out > > > that cephadm has some special code from rbd-mirror to identify daemons > > > in the servicemap: > > > > > > https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L412-L420 > > > > > > So to fix cephfs-mirror, I opted to keep the existing behavior and > > > adjust cephadm: > > > > > > https://github.com/ceph/ceph/pull/40220/commits/30d87f3746ff9daf219366354f24c0d8e306844a > > > > > > For now, at least, that solves the problem. But, as things stand rgw > > > and {cephfs,rbd}-mirror are behaving a bit differently with > > > servicemap. The registrations look like so: > > > > > > { > > > "epoch": 538, > > > "modified": "2021-03-18T17:28:12.500356-0400", > > > "services": { > > > "cephfs-mirror": { > > > "daemons": { > > > "summary": "", > > > "4220": { > > > "start_epoch": 501, > > > "start_stamp": "2021-03-18T12:49:32.929888-0400", > > > "gid": 4220, > > > "addr": "10.3.64.25:0/3521332238", > > > "metadata": { > > > ... > > > "id": "dael.csfspq", > > > "instance_id": "4220", > > > ... > > > }, > > > "task_status": {} > > > } > > > } > > > }, > > > "rbd-mirror": { > > > "daemons": { > > > "summary": "", > > > "4272": { > > > "start_epoch": 531, > > > "start_stamp": "2021-03-18T16:31:26.540108-0400", > > > "gid": 4272, > > > "addr": "10.3.64.25:0/2576541551", > > > "metadata": { > > > ... > > > "id": "dael.kfenmm", > > > "instance_id": "4272", > > > ... > > > }, > > > "task_status": {} > > > }, > > > "4299": { > > > "start_epoch": 534, > > > "start_stamp": "2021-03-18T16:52:59.027580-0400", > > > "gid": 4299, > > > "addr": "10.3.64.25:0/600966616", > > > "metadata": { > > > ... > > > "id": "dael.yfhmmq", > > > "instance_id": "4299", > > > ... > > > }, > > > "task_status": {} > > > } > > > } > > > }, > > > "rgw": { > > > "daemons": { > > > "summary": "", > > > "foo.dael.hwyogi": { > > > "start_epoch": 537, > > > "start_stamp": "2021-03-18T17:27:58.998535-0400", > > > "gid": 4319, > > > "addr": "10.3.64.25:0/3084463187", > > > "metadata": { > > > ... > > > "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", > > > "zone_name": "default", > > > "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", > > > "zonegroup_name": "default" > > > }, > > > "task_status": {} > > > }, > > > "foo.dael.pyvurh": { > > > "start_epoch": 537, > > > "start_stamp": "2021-03-18T17:27:58.999620-0400", > > > "gid": 4318, > > > "addr": "10.3.64.25:0/2303221705", > > > "metadata": { > > > ... > > > "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", > > > "zone_name": "default", > > > "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", > > > "zonegroup_name": "default" > > > }, > > > "task_status": {} > > > }, > > > "foo.dael.rqipjp": { > > > "start_epoch": 538, > > > "start_stamp": "2021-03-18T17:28:10.866327-0400", > > > "gid": 4330, > > > "addr": "10.3.64.25:0/4039152887", > > > "metadata": { > > > ... > > > "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a", > > > "zone_name": "default", > > > "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f", > > > "zonegroup_name": "default" > > > }, > > > "task_status": {} > > > } > > > } > > > } > > > } > > > } > > > > > > With the *-mirror approach, the servicemap "key" is always the gid, > > > and you have to look at the "id" to see how the daemon is > > > named/authenticated. With rgw, the name is the key and there is no > > > "id" key. > > > > > > I'm inclined to just go with the gid-as-key for rgw too and add the > > > "id" key so that we are behaving consistently. This would have the > > > side-effect of also solving the original goal of allowing many rgw > > > daemons to share the same auth identity and still show up in the > > > servicemap. > > > > Just wanted to throw another variation in this model while we are > > talking about it. tcmu-runner for the Ceph iSCSI gateway registers as > > "<node-name>:<pool-name>/<image-name>" [1]. It's implementation > > predates all of these other ones. > > > > > The downside is that interpreting the service for the running daemons > > > is a bit more work. For example, currently ceph -s shows > > > > > > services: > > > mon: 1 daemons, quorum a (age 2d) > > > mgr: x(active, since 58m) > > > osd: 1 osds: 1 up (since 2d), 1 in (since 2d) > > > cephfs-mirror: 1 daemon active (4220) > > > rbd-mirror: 2 daemons active (4272, 4299) > > > rgw: 2 daemons active (foo.dael.rqipjp, foo.dael.sajkvh) > > > > > > Showing the gids there is clearly now what we want. But similarly > > > showing the daemon names is probably also a bad idea since it won't > > > scale beyond ~3 or so; we probably just want a simple count. > > > > tcmu-runner really hit this scaling issue and Xiubo just added the > > ability to programatically fold these together via optional > > "daemon_type" and "daemon_prefix" metadata values [2][3] so that "ceph > > -s" will show something like: > > > > ... snip ... > > tcmu-runner: 3 portals active (gateway0, gateway1, gateway2) > > ... snip ... > > > > > Reasonable? > > > sage > > > _______________________________________________ > > > Dev mailing list -- dev@xxxxxxx > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > > > > [1] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L190 > > [2] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L202 > > [3] https://github.com/ceph/ceph/blob/master/src/mgr/ServiceMap.cc#L83 > > > > -- > > Jason > > > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx