Re: servicemap vs cephadm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 19, 2021 at 9:50 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> Okay, I adjusted rgw to register under gid like the others, and
> changed the cephadm logic around to cope.

thanks! i was hoping for a solution on the ceph-mgr side in
discussions on https://github.com/ceph/ceph/pull/40035

>
> I also cleaned simplified the 'ceph -s' output:
>
>   services:
>     mon:           1 daemons, quorum a (age 15s)
>     mgr:           x(active, since 92s)
>     osd:           1 osds: 1 up (since 71s), 1 in (since 88s)
>     cephfs-mirror: 1 daemon active (1 hosts)
>     rbd-mirror:    2 daemons active (1 hosts)
>     rgw:           2 daemons active (1 hosts, 1 zones)
>
> - don't list individual daemon ids (won't scale for large clusters)
> - present any groupings we can identify (currently just distinct hosts
> and rgw zones; if there are reasonable groupings for cephfs,rbd-mirror
> or iscsi let's add those too)
>
> s
>
> On Thu, Mar 18, 2021 at 8:26 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > On Thu, Mar 18, 2021 at 9:00 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > >
> > > Hi everyone,
> > >
> > > The non-core daemon registrations in servicemap vs cephadm came up
> > > twice in the last couple of weeks:
> > >
> > > First, https://github.com/ceph/ceph/pull/40035 changed rgw to register
> > > as rgw.$id.$gid and made cephadm complain about stray unmanaged
> > > daemons.  The motivation was that the PR allows multiple radosgw
> > > daemons to share the same auth name + key and still show up in the
> > > servicemap.
> > >
> > > Then, today, I noticed that cephfs-mirror caused the same cephadm
> > > error because was registering as cephfs-mirror.$gid instead of the
> > > cephfs-mirror.$id that cephadm expected.  I went to fix that in
> > > cephfs-mirror, but noticed that the behavior was copied from
> > > rbd-mirror.. which wasn't causing any cephadm error.  It turns out
> > > that cephadm has some special code from rbd-mirror to identify daemons
> > > in the servicemap:
> > >
> > >   https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L412-L420
> > >
> > > So to fix cephfs-mirror, I opted to keep the existing behavior and
> > > adjust cephadm:
> > >
> > >   https://github.com/ceph/ceph/pull/40220/commits/30d87f3746ff9daf219366354f24c0d8e306844a
> > >
> > > For now, at least, that solves the problem.  But, as things stand rgw
> > > and {cephfs,rbd}-mirror are behaving a bit differently with
> > > servicemap.  The registrations look like so:
> > >
> > > {
> > >     "epoch": 538,
> > >     "modified": "2021-03-18T17:28:12.500356-0400",
> > >     "services": {
> > >         "cephfs-mirror": {
> > >             "daemons": {
> > >                 "summary": "",
> > >                 "4220": {
> > >                     "start_epoch": 501,
> > >                     "start_stamp": "2021-03-18T12:49:32.929888-0400",
> > >                     "gid": 4220,
> > >                     "addr": "10.3.64.25:0/3521332238",
> > >                     "metadata": {
> > > ...
> > >                         "id": "dael.csfspq",
> > >                         "instance_id": "4220",
> > > ...
> > >                     },
> > >                     "task_status": {}
> > >                 }
> > >             }
> > >         },
> > >         "rbd-mirror": {
> > >             "daemons": {
> > >                 "summary": "",
> > >                 "4272": {
> > >                     "start_epoch": 531,
> > >                     "start_stamp": "2021-03-18T16:31:26.540108-0400",
> > >                     "gid": 4272,
> > >                     "addr": "10.3.64.25:0/2576541551",
> > >                     "metadata": {
> > > ...
> > >                         "id": "dael.kfenmm",
> > >                         "instance_id": "4272",
> > > ...
> > >                     },
> > >                     "task_status": {}
> > >                 },
> > >                 "4299": {
> > >                     "start_epoch": 534,
> > >                     "start_stamp": "2021-03-18T16:52:59.027580-0400",
> > >                     "gid": 4299,
> > >                     "addr": "10.3.64.25:0/600966616",
> > >                     "metadata": {
> > > ...
> > >                         "id": "dael.yfhmmq",
> > >                         "instance_id": "4299",
> > > ...
> > >                     },
> > >                     "task_status": {}
> > >                 }
> > >             }
> > >         },
> > >         "rgw": {
> > >             "daemons": {
> > >                 "summary": "",
> > >                 "foo.dael.hwyogi": {
> > >                     "start_epoch": 537,
> > >                     "start_stamp": "2021-03-18T17:27:58.998535-0400",
> > >                     "gid": 4319,
> > >                     "addr": "10.3.64.25:0/3084463187",
> > >                     "metadata": {
> > > ...
> > >                         "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
> > >                         "zone_name": "default",
> > >                         "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
> > >                         "zonegroup_name": "default"
> > >                     },
> > >                     "task_status": {}
> > >                 },
> > >                 "foo.dael.pyvurh": {
> > >                     "start_epoch": 537,
> > >                     "start_stamp": "2021-03-18T17:27:58.999620-0400",
> > >                     "gid": 4318,
> > >                     "addr": "10.3.64.25:0/2303221705",
> > >                     "metadata": {
> > > ...
> > >                         "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
> > >                         "zone_name": "default",
> > >                         "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
> > >                         "zonegroup_name": "default"
> > >                     },
> > >                     "task_status": {}
> > >                 },
> > >                 "foo.dael.rqipjp": {
> > >                     "start_epoch": 538,
> > >                     "start_stamp": "2021-03-18T17:28:10.866327-0400",
> > >                     "gid": 4330,
> > >                     "addr": "10.3.64.25:0/4039152887",
> > >                     "metadata": {
> > > ...
> > >                         "zone_id": "6321d54d-d780-43f3-af53-ce52aed2ef8a",
> > >                         "zone_name": "default",
> > >                         "zonegroup_id": "e8453745-84a7-4d58-9aa9-9bfaf1ce9a7f",
> > >                         "zonegroup_name": "default"
> > >                     },
> > >                     "task_status": {}
> > >                 }
> > >             }
> > >         }
> > >     }
> > > }
> > >
> > > With the *-mirror approach, the servicemap "key" is always the gid,
> > > and you have to look at the "id" to see how the daemon is
> > > named/authenticated.  With rgw, the name is the key and there is no
> > > "id" key.
> > >
> > > I'm inclined to just go with the gid-as-key for rgw too and add the
> > > "id" key so that we are behaving consistently.  This would have the
> > > side-effect of also solving the original goal of allowing many rgw
> > > daemons to share the same auth identity and still show up in the
> > > servicemap.
> >
> > Just wanted to throw another variation in this model while we are
> > talking about it. tcmu-runner for the Ceph iSCSI gateway registers as
> > "<node-name>:<pool-name>/<image-name>" [1]. It's implementation
> > predates all of these other ones.
> >
> > > The downside is that interpreting the service for the running daemons
> > > is a bit more work.  For example, currently ceph -s shows
> > >
> > >   services:
> > >     mon:           1 daemons, quorum a (age 2d)
> > >     mgr:           x(active, since 58m)
> > >     osd:           1 osds: 1 up (since 2d), 1 in (since 2d)
> > >     cephfs-mirror: 1 daemon active (4220)
> > >     rbd-mirror:    2 daemons active (4272, 4299)
> > >     rgw:           2 daemons active (foo.dael.rqipjp, foo.dael.sajkvh)
> > >
> > > Showing the gids there is clearly now what we want.  But similarly
> > > showing the daemon names is probably also a bad idea since it won't
> > > scale beyond ~3 or so; we probably just want a simple count.
> >
> > tcmu-runner really hit this scaling issue and Xiubo just added the
> > ability to programatically fold these together via optional
> > "daemon_type" and "daemon_prefix" metadata values [2][3] so that "ceph
> > -s" will show something like:
> >
> > ... snip ...
> >   tcmu-runner: 3 portals active (gateway0, gateway1, gateway2)
> > ... snip ...
> >
> > > Reasonable?
> > > sage
> > > _______________________________________________
> > > Dev mailing list -- dev@xxxxxxx
> > > To unsubscribe send an email to dev-leave@xxxxxxx
> > >
> >
> > [1] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L190
> > [2] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L202
> > [3] https://github.com/ceph/ceph/blob/master/src/mgr/ServiceMap.cc#L83
> >
> > --
> > Jason
> >
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux