Re: servicemap vs cephadm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay, I adjusted rgw to register under gid like the others, and
changed the cephadm logic around to cope.

I also cleaned simplified the 'ceph -s' output:

  services:
    mon:           1 daemons, quorum a (age 15s)
    mgr:           x(active, since 92s)
    osd:           1 osds: 1 up (since 71s), 1 in (since 88s)
    cephfs-mirror: 1 daemon active (1 hosts)
    rbd-mirror:    2 daemons active (1 hosts)
    rgw:           2 daemons active (1 hosts, 1 zones)

- don't list individual daemon ids (won't scale for large clusters)
- present any groupings we can identify (currently just distinct hosts
and rgw zones; if there are reasonable groupings for cephfs,rbd-mirror
or iscsi let's add those too)

s

On Thu, Mar 18, 2021 at 8:26 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> On Thu, Mar 18, 2021 at 9:00 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >
> > Hi everyone,
> >
> > The non-core daemon registrations in servicemap vs cephadm came up
> > twice in the last couple of weeks:
> >
> > First, https://github.com/ceph/ceph/pull/40035 changed rgw to register
> > as rgw.$id.$gid and made cephadm complain about stray unmanaged
> > daemons.  The motivation was that the PR allows multiple radosgw
> > daemons to share the same auth name + key and still show up in the
> > servicemap.
> >
> > Then, today, I noticed that cephfs-mirror caused the same cephadm
> > error because was registering as cephfs-mirror.$gid instead of the
> > cephfs-mirror.$id that cephadm expected.  I went to fix that in
> > cephfs-mirror, but noticed that the behavior was copied from
> > rbd-mirror.. which wasn't causing any cephadm error.  It turns out
> > that cephadm has some special code from rbd-mirror to identify daemons
> > in the servicemap:
> >
> >   https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L412-L420
> >
> > So to fix cephfs-mirror, I opted to keep the existing behavior and
> > adjust cephadm:
> >
> >   https://github.com/ceph/ceph/pull/40220/commits/30d87f3746ff9daf219366354f24c0d8e306844a
> >
> > For now, at least, that solves the problem.  But, as things stand rgw
> > and {cephfs,rbd}-mirror are behaving a bit differently with
> > servicemap.  The registrations look like so:
> >
> > {
> >     "epoch": 538,
> >     "modified": "2021-03-18T17:28:12.500356-0400",
> >     "services": {
> >         "cephfs-mirror": {
> >             "daemons": {
> >                 "summary": "",
> >                 "4220": {
> >                     "start_epoch": 501,
> >                     "start_stamp": "2021-03-18T12:49:32.929888-0400",
> >                     "gid": 4220,
> >                     "addr": "10.3.64.25:0/3521332238",
> >                     "metadata": {
> > ...
> >                         "id": "dael.csfspq",
> >                         "instance_id": "4220",
> > ...
> >                     },
> >                     "task_status": {}
> >                 }
> >             }
> >         },
> >         "rbd-mirror": {
> >             "daemons": {
> >                 "summary": "",
> >                 "4272": {
> >                     "start_epoch": 531,
> >                     "start_stamp": "2021-03-18T16:31:26.540108-0400",
> >                     "gid": 4272,
> >                     "addr": "10.3.64.25:0/2576541551",
> >                     "metadata": {
> > ...
> >                         "id": "dael.kfenmm",
> >                         "instance_id": "4272",
> > ...
> >                     },
> >                     "task_status": {}
> >                 },
> >                 "4299": {
> >                     "start_epoch": 534,
> >                     "start_stamp": "2021-03-18T16:52:59.027580-0400",
> >                     "gid": 4299,
> >                     "addr": "10.3.64.25:0/600966616",
> >                     "metadata": {
> > ...
> >                         "id": "dael.yfhmmq",
> >                         "instance_id": "4299",
> > ...
> >                     },
> >                     "task_status": {}
> >                 }
> >             }
> >         },
> >         "rgw": {
> >             "daemons": {
> >                 "summary": "",
> >                 "foo.dael.hwyogi": {
> >                     "start_epoch": 537,
> >                     "start_stamp": "2021-03-18T17:27:58.998535-0400",
> >                     "gid": 4319,
> >                     "addr": "10.3.64.25:0/3084463187",
> >                     "metadata": {
> > ...
> >                         "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
> >                         "zone_name": "default",
> >                         "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
> >                         "zonegroup_name": "default"
> >                     },
> >                     "task_status": {}
> >                 },
> >                 "foo.dael.pyvurh": {
> >                     "start_epoch": 537,
> >                     "start_stamp": "2021-03-18T17:27:58.999620-0400",
> >                     "gid": 4318,
> >                     "addr": "10.3.64.25:0/2303221705",
> >                     "metadata": {
> > ...
> >                         "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
> >                         "zone_name": "default",
> >                         "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
> >                         "zonegroup_name": "default"
> >                     },
> >                     "task_status": {}
> >                 },
> >                 "foo.dael.rqipjp": {
> >                     "start_epoch": 538,
> >                     "start_stamp": "2021-03-18T17:28:10.866327-0400",
> >                     "gid": 4330,
> >                     "addr": "10.3.64.25:0/4039152887",
> >                     "metadata": {
> > ...
> >                         "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
> >                         "zone_name": "default",
> >                         "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
> >                         "zonegroup_name": "default"
> >                     },
> >                     "task_status": {}
> >                 }
> >             }
> >         }
> >     }
> > }
> >
> > With the *-mirror approach, the servicemap "key" is always the gid,
> > and you have to look at the "id" to see how the daemon is
> > named/authenticated.  With rgw, the name is the key and there is no
> > "id" key.
> >
> > I'm inclined to just go with the gid-as-key for rgw too and add the
> > "id" key so that we are behaving consistently.  This would have the
> > side-effect of also solving the original goal of allowing many rgw
> > daemons to share the same auth identity and still show up in the
> > servicemap.
>
> Just wanted to throw another variation in this model while we are
> talking about it. tcmu-runner for the Ceph iSCSI gateway registers as
> "<node-name>:<pool-name>/<image-name>" [1]. It's implementation
> predates all of these other ones.
>
> > The downside is that interpreting the service for the running daemons
> > is a bit more work.  For example, currently ceph -s shows
> >
> >   services:
> >     mon:           1 daemons, quorum a (age 2d)
> >     mgr:           x(active, since 58m)
> >     osd:           1 osds: 1 up (since 2d), 1 in (since 2d)
> >     cephfs-mirror: 1 daemon active (4220)
> >     rbd-mirror:    2 daemons active (4272, 4299)
> >     rgw:           2 daemons active (foo.dael.rqipjp, foo.dael.sajkvh)
> >
> > Showing the gids there is clearly now what we want.  But similarly
> > showing the daemon names is probably also a bad idea since it won't
> > scale beyond ~3 or so; we probably just want a simple count.
>
> tcmu-runner really hit this scaling issue and Xiubo just added the
> ability to programatically fold these together via optional
> "daemon_type" and "daemon_prefix" metadata values [2][3] so that "ceph
> -s" will show something like:
>
> ... snip ...
>   tcmu-runner: 3 portals active (gateway0, gateway1, gateway2)
> ... snip ...
>
> > Reasonable?
> > sage
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> >
>
> [1] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L190
> [2] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L202
> [3] https://github.com/ceph/ceph/blob/master/src/mgr/ServiceMap.cc#L83
>
> --
> Jason
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux