Re: Running different rgw daemon with same cephxuser

Jiffin Thottan <jthottan@xxxxxxxxxx> · Mon, 15 Feb 2021 00:18:15 -0500 (EST)

Thanks Kyle for the confirmation

----- Original Message -----
From: "Kyle Bader" <kyle.bader@xxxxxxxxx>
To: "Jiffin Thottan" <jthottan@xxxxxxxxxx>
Cc: "Kaleb Keithley" <kkeithle@xxxxxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Matt Benjamin" <mbenjami@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>, "Sebastien Han" <shan@xxxxxxxxxx>, "Travis Nielsen" <tnielsen@xxxxxxxxxx>, "ceph-rgw-eng" <ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>, "dev" <dev@xxxxxxx>
Sent: Sunday, February 14, 2021 6:19:29 AM
Subject: Re: Running different rgw daemon with same cephxuser

You would need new tcp connections for kube proxy to send to new hosts

On Thu, Feb 11, 2021 at 03:47 Jiffin Thottan <jthottan@xxxxxxxxxx> wrote:

> I was able to test the PR against HPA in minikube and it is working as
> expected.
>
> # ceph status
>   cluster:
>     id:     c7a87662-dccb-4143-bf68-58ff676a0362
>     health: HEALTH_WARN
>             mon a is low on available space
>             8 pool(s) have no replicas configured
>
>   services:
>     mon: 1 daemons, quorum a (age 20m)
>     mgr: a(active, since 19m)
>     osd: 1 osds: 1 up (since 19m), 1 in (since 19m)
>     rgw: 3 daemons active (my.store.a.my-store.my-store.4383,
> my.store.a.my-store.my-store.4715, my.store.a.my-store.my-store.4717)
>
>   data:
>     pools:   8 pools, 96 pgs
>     objects: 2.57k objects, 8.5 MiB
>     usage:   85 MiB used, 20 GiB / 20 GiB avail
>     pgs:     96 active+clean
>
>   io:
>     client:   611 KiB/s rd, 386 KiB/s wr, 696 op/s rd, 1.27k op/s wr
>
> even metrics separated shown from ceph mgr.
>
> @Matt @Casey :
>
> I saw following wrt s3 client
>
> I created HPA for rgw pod which will scale pods based on no of requests,
>
> I trigger recursive directory(4480 directories, 67705 files) copy from s3
> client using the following command
>
> aws s3 cp <directory> --no-verify-ssl --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT
> s3://$BUCKET_NAME
>
> even hpa scaled the rgw pods, requests were not sending to new created rgw
> pods(daemons)
>
> but when I triggered another recursive copy it was sent to all the pods.
>
> Is this behaviour expected??
>
>
> --
>
> Jiffin
>
> ----- Original Message -----
> From: "Sebastien Han" <shan@xxxxxxxxxx>
> To: "Jiffin Thottan" <jthottan@xxxxxxxxxx>
> Cc: "Matt Benjamin" <mbenjami@xxxxxxxxxx>, "ceph-rgw-eng" <
> ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>,
> "dev" <dev@xxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Kaleb
> Keithley" <kkeithle@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>,
> "Travis Nielsen" <tnielsen@xxxxxxxxxx>
> Sent: Wednesday, February 10, 2021 1:20:14 PM
> Subject: Re: Running different rgw daemon with same cephxuser
>
> Sounds good, thanks guys! It does compile so go for it :)
> –––––––––
> Sébastien Han
> Senior Principal Software Engineer, Storage Architect
>
> "Always give 100%. Unless you're giving blood."
> On Wed, Feb 10, 2021 at 6:29 AM Jiffin Thottan <jthottan@xxxxxxxxxx>
> wrote:
> >
> > Hey Seb,
> >
> > I will test the PR against HPA and let u know the results (within one or
> two days).
> > --
> > Jiffin
> >
> > ----- Original Message -----
> > From: "Sebastien Han" <shan@xxxxxxxxxx>
> > To: "Matt Benjamin" <mbenjami@xxxxxxxxxx>
> > Cc: "Jiffin Thottan" <jthottan@xxxxxxxxxx>, "ceph-rgw-eng" <
> ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>,
> "dev" <dev@xxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Kaleb
> Keithley" <kkeithle@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>,
> "Travis Nielsen" <tnielsen@xxxxxxxxxx>
> > Sent: Tuesday, February 9, 2021 10:11:47 PM
> > Subject: Re: Running different rgw daemon with same cephxuser
> >
> > Thank Matt, I just sent this to kick in the discussion
> > https://github.com/ceph/ceph/pull/39380
> > If someone wants to take over it's preferable I guess, this is mainly
> > due to my limited C++ knowledge.
> >
> > So feel free to assign someone from your team to take over so we can
> > move faster with this one.
> > Thanks!
> > –––––––––
> > Sébastien Han
> > Senior Principal Software Engineer, Storage Architect
> >
> > "Always give 100%. Unless you're giving blood."
> >
> > On Mon, Feb 8, 2021 at 3:53 PM Matt Benjamin <mbenjami@xxxxxxxxxx>
> wrote:
> > >
> > > HI Sebastien,
> > >
> > > That seems like a concise and reasonable solution to me.  It seems
> > > like the metrics from a single instance should in fact be transient
> > > (leaving the problem of maintaining aggregate values to prometheus or
> > > even downstream of that?
> > >
> > > Matt
> > >
> > > On Mon, Feb 8, 2021 at 9:47 AM Sebastien Han <shan@xxxxxxxxxx> wrote:
> > > >
> > > > Hi Jiffin,
> > > >
> > > > From my perspective, one simple way to fix this (although we must be
> > > > careful with backward compatibility) would be for rgw to register to
> > > > service map differently.
> > > > Today it is using the daemon name like rgw.foo, then it will register
> > > > as foo. Essentially, if you try to run that pod twice you would still
> > > > see a single instance in the service map as well as the prometheus
> > > > metrics.
> > > >
> > > > It would be nice to register with RADOS client session ID instead ,
> > > > just like rbd-mirror does by using instance_id. Something like:
> > > >
> > > > std::string instance_id = stringify(rados->get_instance_id());
> > > > int ret = rados.service_daemon_register(daemon_type, name, metadata);
> > > >
> > > > Here
> https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L1139
> > > > With that we can re-use the same cephx user and scale to any number,
> > > > all instances will use the same cephx to authenticate to the cluster
> > > > but they will show up as N in the service map.
> > > >
> > > > I guess one downside is that as soon as the daemon restart, we get a
> > > > new RADOS client session ID, and thus our name changes, which means
> we
> > > > are losing all the metrics...
> > > > Thoughts?
> > > >
> > > > Thanks!
> > > > –––––––––
> > > > Sébastien Han
> > > > Senior Principal Software Engineer, Storage Architect
> > > >
> > > > "Always give 100%. Unless you're giving blood."
> > > >
> > > > On Thu, Feb 4, 2021 at 3:39 PM Jiffin Thottan <jthottan@xxxxxxxxxx>
> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > In OCS(Rook) env workflow for RGW daemons as follows,
> > > > >
> > > > > Normally for creating ceph object-store, the first Rook creates
> pools for rgw daemon with the specified configuration.
> > > > >
> > > > > Then depending on the no of instances, Rook create cephxuser and
> then rgw spawn daemon in the container(pod) using its id
> > > > > with following arguments for radosgw binary
> > > > >     Args:
> > > > >       --fsid=91501490-4b55-47db-b226-f9d9968774c1
> > > > >       --keyring=/etc/ceph/keyring-store/keyring
> > > > >       --log-to-stderr=true
> > > > >       --err-to-stderr=true
> > > > >       --mon-cluster-log-to-stderr=true
> > > > >       --log-stderr-prefix=debug
> > > > >       --default-log-to-file=false
> > > > >       --default-mon-cluster-log-to-file=false
> > > > >       --mon-host=$(ROOK_CEPH_MON_HOST)
> > > > >       --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS)
> > > > >       --id=rgw.my.store.a
> > > > >       --setuser=ceph
> > > > >       --setgroup=ceph
> > > > >       --foreground
> > > > >       --rgw-frontends=beast port=8080
> > > > >       --host=$(POD_NAME)
> > > > >       --rgw-mime-types-file=/etc/ceph/rgw/mime.types
> > > > >       --rgw-realm=my-store
> > > > >       --rgw-zonegroup=my-store
> > > > >       --rgw-zone=my-store
> > > > >
> > > > > And here cephxuser will be "client.rgw.my.store.a" and all the
> pools for rgw will be created as my-store*. Normally if there is
> > > > > a request for another instance in the config file for a
> ceph-object-store config file[1] for rook, another user
> "client.rgw.mystore.b"
> > > > > will be created by rook and will consume the same pools.
> > > > >
> > > > > There is a feature in Kubernetes known as autoscale in which pods
> can be automatically scaled based on specified metrics. If we apply that
> > > > > feature for rgw pods, Kubernetes will automatically scale the rgw
> pods(like a clone of the existing pod) with the same argument for "--id"
> > > > > based on the metrics, but ceph cannot distinguish those as
> different rgw daemons even though multiple pods of rgw are running
> simultaneously.
> > > > >  In "ceph status" shows only one daemon rgw as well
> > > > >
> > > > > In vstart or ceph ansible(Ali help me to figure it out), I can
> see for each rg
> <https://www.google.com/maps/search/ansible(Ali+help+me+to+figure+it+out),+I+can+see+for+each+rg?entry=gmail&source=g>w
> daemon a cephxuser is getting created as well
> > > > >
> > > > > Is this behaviour intended ? or am I hitting any corner case which
> was never tested before?
> > > > >
> > > > > There is no point of autoscaling of rgw pod if it considered to
> the same daemon, the s3 client will talk to only one of the pods and ceph
> mgr
> > > > > provides metrics can give incorrect data as well which can affect
> the autoscale feature
> > > > >
> > > > > Also opened an issue in rook for the time being [2]
> > > > >
> > > > > [1]
> https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/object-test.yaml
> > > > > [2] https://github.com/rook/rook/issues/6943
> > > > >
> > > > > Regards,
> > > > > Jiffin
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Matt Benjamin
> > > Red Hat, Inc.
> > > 315 West Huron Street, Suite 140A
> > > Ann Arbor, Michigan 48103
> > >
> > > http://www.redhat.com/en/technologies/storage
> > >
> > > tel.  734-821-5101
> > > fax.  734-769-8938
> > > cel.  734-216-5309
> > >
> >
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx