Re: Running different rgw daemon with same cephxuser

Kyle Bader <kyle.bader@xxxxxxxxx> · Sat, 13 Feb 2021 16:49:29 -0800

You would need new tcp connections for kube proxy to send to new hosts 

On Thu, Feb 11, 2021 at 03:47 Jiffin Thottan <jthottan@xxxxxxxxxx> wrote:
I was able to test the PR against HPA in minikube and it is working as expected.

# ceph status

  cluster:

    id:     c7a87662-dccb-4143-bf68-58ff676a0362

    health: HEALTH_WARN

            mon a is low on available space

            8 pool(s) have no replicas configured

  services:

    mon: 1 daemons, quorum a (age 20m)

    mgr: a(active, since 19m)

    osd: 1 osds: 1 up (since 19m), 1 in (since 19m)

    rgw: 3 daemons active (my.store.a.my-store.my-store.4383, my.store.a.my-store.my-store.4715, my.store.a.my-store.my-store.4717)

  data:

    pools:   8 pools, 96 pgs

    objects: 2.57k objects, 8.5 MiB

    usage:   85 MiB used, 20 GiB / 20 GiB avail

    pgs:     96 active+clean

  io:

    client:   611 KiB/s rd, 386 KiB/s wr, 696 op/s rd, 1.27k op/s wr

even metrics separated shown from ceph mgr.

@Matt @Casey :

I saw following wrt s3 client

I created HPA for rgw pod which will scale pods based on no of requests,

I trigger recursive directory(4480 directories, 67705 files) copy from s3 client using the following command

aws s3 cp <directory> --no-verify-ssl --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT s3://$BUCKET_NAME

even hpa scaled the rgw pods, requests were not sending to new created rgw pods(daemons)

but when I triggered another recursive copy it was sent to all the pods.

Is this behaviour expected??

--

Jiffin

----- Original Message -----

From: "Sebastien Han" <shan@xxxxxxxxxx>

To: "Jiffin Thottan" <jthottan@xxxxxxxxxx>

Cc: "Matt Benjamin" <mbenjami@xxxxxxxxxx>, "ceph-rgw-eng" <ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>, "dev" <dev@xxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Kaleb Keithley" <kkeithle@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>, "Travis Nielsen" <tnielsen@xxxxxxxxxx>

Sent: Wednesday, February 10, 2021 1:20:14 PM

Subject: Re: Running different rgw daemon with same cephxuser

Sounds good, thanks guys! It does compile so go for it :)

–––––––––

Sébastien Han

Senior Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."

On Wed, Feb 10, 2021 at 6:29 AM Jiffin Thottan <jthottan@xxxxxxxxxx> wrote:

>

> Hey Seb,

>

> I will test the PR against HPA and let u know the results (within one or two days).

> --

> Jiffin

>

> ----- Original Message -----

> From: "Sebastien Han" <shan@xxxxxxxxxx>

> To: "Matt Benjamin" <mbenjami@xxxxxxxxxx>

> Cc: "Jiffin Thottan" <jthottan@xxxxxxxxxx>, "ceph-rgw-eng" <ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>, "dev" <dev@xxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Kaleb Keithley" <kkeithle@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>, "Travis Nielsen" <tnielsen@xxxxxxxxxx>

> Sent: Tuesday, February 9, 2021 10:11:47 PM

> Subject: Re: Running different rgw daemon with same cephxuser

>

> Thank Matt, I just sent this to kick in the discussion

> https://github.com/ceph/ceph/pull/39380

> If someone wants to take over it's preferable I guess, this is mainly

> due to my limited C++ knowledge.

>

> So feel free to assign someone from your team to take over so we can

> move faster with this one.

> Thanks!

> –––––––––

> Sébastien Han

> Senior Principal Software Engineer, Storage Architect

>

> "Always give 100%. Unless you're giving blood."

>

> On Mon, Feb 8, 2021 at 3:53 PM Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:

> >

> > HI Sebastien,

> >

> > That seems like a concise and reasonable solution to me.  It seems

> > like the metrics from a single instance should in fact be transient

> > (leaving the problem of maintaining aggregate values to prometheus or

> > even downstream of that?

> >

> > Matt

> >

> > On Mon, Feb 8, 2021 at 9:47 AM Sebastien Han <shan@xxxxxxxxxx> wrote:

> > >

> > > Hi Jiffin,

> > >

> > > From my perspective, one simple way to fix this (although we must be

> > > careful with backward compatibility) would be for rgw to register to

> > > service map differently.

> > > Today it is using the daemon name like rgw.foo, then it will register

> > > as foo. Essentially, if you try to run that pod twice you would still

> > > see a single instance in the service map as well as the prometheus

> > > metrics.

> > >

> > > It would be nice to register with RADOS client session ID instead ,

> > > just like rbd-mirror does by using instance_id. Something like:

> > >

> > > std::string instance_id = stringify(rados->get_instance_id());

> > > int ret = rados.service_daemon_register(daemon_type, name, metadata);

> > >

> > > Here https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L1139

> > > With that we can re-use the same cephx user and scale to any number,

> > > all instances will use the same cephx to authenticate to the cluster

> > > but they will show up as N in the service map.

> > >

> > > I guess one downside is that as soon as the daemon restart, we get a

> > > new RADOS client session ID, and thus our name changes, which means we

> > > are losing all the metrics...

> > > Thoughts?

> > >

> > > Thanks!

> > > –––––––––

> > > Sébastien Han

> > > Senior Principal Software Engineer, Storage Architect

> > >

> > > "Always give 100%. Unless you're giving blood."

> > >

> > > On Thu, Feb 4, 2021 at 3:39 PM Jiffin Thottan <jthottan@xxxxxxxxxx> wrote:

> > > >

> > > > Hi all,

> > > >

> > > > In OCS(Rook) env workflow for RGW daemons as follows,

> > > >

> > > > Normally for creating ceph object-store, the first Rook creates pools for rgw daemon with the specified configuration.

> > > >

> > > > Then depending on the no of instances, Rook create cephxuser and then rgw spawn daemon in the container(pod) using its id

> > > > with following arguments for radosgw binary

> > > >     Args:

> > > >       --fsid=91501490-4b55-47db-b226-f9d9968774c1

> > > >       --keyring=/etc/ceph/keyring-store/keyring

> > > >       --log-to-stderr=true

> > > >       --err-to-stderr=true

> > > >       --mon-cluster-log-to-stderr=true

> > > >       --log-stderr-prefix=debug

> > > >       --default-log-to-file=false

> > > >       --default-mon-cluster-log-to-file=false

> > > >       --mon-host=$(ROOK_CEPH_MON_HOST)

> > > >       --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS)

> > > >       --id=rgw.my.store.a

> > > >       --setuser=ceph

> > > >       --setgroup=ceph

> > > >       --foreground

> > > >       --rgw-frontends=beast port=8080

> > > >       --host=$(POD_NAME)

> > > >       --rgw-mime-types-file=/etc/ceph/rgw/mime.types

> > > >       --rgw-realm=my-store

> > > >       --rgw-zonegroup=my-store

> > > >       --rgw-zone=my-store

> > > >

> > > > And here cephxuser will be "client.rgw.my.store.a" and all the pools for rgw will be created as my-store*. Normally if there is

> > > > a request for another instance in the config file for a ceph-object-store config file[1] for rook, another user "client.rgw.mystore.b"

> > > > will be created by rook and will consume the same pools.

> > > >

> > > > There is a feature in Kubernetes known as autoscale in which pods can be automatically scaled based on specified metrics. If we apply that

> > > > feature for rgw pods, Kubernetes will automatically scale the rgw pods(like a clone of the existing pod) with the same argument for "--id"

> > > > based on the metrics, but ceph cannot distinguish those as different rgw daemons even though multiple pods of rgw are running simultaneously.

> > > >  In "ceph status" shows only one daemon rgw as well

> > > >

> > > > In vstart or ceph ansible(Ali help me to figure it out), I can see for each rgw daemon a cephxuser is getting created as well

> > > >

> > > > Is this behaviour intended ? or am I hitting any corner case which was never tested before?

> > > >

> > > > There is no point of autoscaling of rgw pod if it considered to the same daemon, the s3 client will talk to only one of the pods and ceph mgr

> > > > provides metrics can give incorrect data as well which can affect the autoscale feature

> > > >

> > > > Also opened an issue in rook for the time being [2]

> > > >

> > > > [1] https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/object-test.yaml

> > > > [2] https://github.com/rook/rook/issues/6943

> > > >

> > > > Regards,

> > > > Jiffin

> > > >

> > >

> >

> >

> > --

> >

> > Matt Benjamin

> > Red Hat, Inc.

> > 315 West Huron Street, Suite 140A

> > Ann Arbor, Michigan 48103

> >

> > http://www.redhat.com/en/technologies/storage

> >

> > tel.  734-821-5101

> > fax.  734-769-8938

> > cel.  734-216-5309

> >

>

_______________________________________________

Dev mailing list -- dev@xxxxxxx

To unsubscribe send an email to dev-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx