Thanks Kyle for the confirmation ----- Original Message ----- From: "Kyle Bader" <kyle.bader@xxxxxxxxx> To: "Jiffin Thottan" <jthottan@xxxxxxxxxx> Cc: "Kaleb Keithley" <kkeithle@xxxxxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Matt Benjamin" <mbenjami@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>, "Sebastien Han" <shan@xxxxxxxxxx>, "Travis Nielsen" <tnielsen@xxxxxxxxxx>, "ceph-rgw-eng" <ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>, "dev" <dev@xxxxxxx> Sent: Sunday, February 14, 2021 6:19:29 AM Subject: Re: Running different rgw daemon with same cephxuser You would need new tcp connections for kube proxy to send to new hosts On Thu, Feb 11, 2021 at 03:47 Jiffin Thottan <jthottan@xxxxxxxxxx> wrote: > I was able to test the PR against HPA in minikube and it is working as > expected. > > # ceph status > cluster: > id: c7a87662-dccb-4143-bf68-58ff676a0362 > health: HEALTH_WARN > mon a is low on available space > 8 pool(s) have no replicas configured > > services: > mon: 1 daemons, quorum a (age 20m) > mgr: a(active, since 19m) > osd: 1 osds: 1 up (since 19m), 1 in (since 19m) > rgw: 3 daemons active (my.store.a.my-store.my-store.4383, > my.store.a.my-store.my-store.4715, my.store.a.my-store.my-store.4717) > > data: > pools: 8 pools, 96 pgs > objects: 2.57k objects, 8.5 MiB > usage: 85 MiB used, 20 GiB / 20 GiB avail > pgs: 96 active+clean > > io: > client: 611 KiB/s rd, 386 KiB/s wr, 696 op/s rd, 1.27k op/s wr > > even metrics separated shown from ceph mgr. > > @Matt @Casey : > > I saw following wrt s3 client > > I created HPA for rgw pod which will scale pods based on no of requests, > > I trigger recursive directory(4480 directories, 67705 files) copy from s3 > client using the following command > > aws s3 cp <directory> --no-verify-ssl --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT > s3://$BUCKET_NAME > > even hpa scaled the rgw pods, requests were not sending to new created rgw > pods(daemons) > > but when I triggered another recursive copy it was sent to all the pods. > > Is this behaviour expected?? > > > -- > > Jiffin > > ----- Original Message ----- > From: "Sebastien Han" <shan@xxxxxxxxxx> > To: "Jiffin Thottan" <jthottan@xxxxxxxxxx> > Cc: "Matt Benjamin" <mbenjami@xxxxxxxxxx>, "ceph-rgw-eng" < > ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>, > "dev" <dev@xxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Kaleb > Keithley" <kkeithle@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>, > "Travis Nielsen" <tnielsen@xxxxxxxxxx> > Sent: Wednesday, February 10, 2021 1:20:14 PM > Subject: Re: Running different rgw daemon with same cephxuser > > Sounds good, thanks guys! It does compile so go for it :) > ––––––––– > Sébastien Han > Senior Principal Software Engineer, Storage Architect > > "Always give 100%. Unless you're giving blood." > On Wed, Feb 10, 2021 at 6:29 AM Jiffin Thottan <jthottan@xxxxxxxxxx> > wrote: > > > > Hey Seb, > > > > I will test the PR against HPA and let u know the results (within one or > two days). > > -- > > Jiffin > > > > ----- Original Message ----- > > From: "Sebastien Han" <shan@xxxxxxxxxx> > > To: "Matt Benjamin" <mbenjami@xxxxxxxxxx> > > Cc: "Jiffin Thottan" <jthottan@xxxxxxxxxx>, "ceph-rgw-eng" < > ceph-rgw-eng@xxxxxxxxxx>, "ceph-tech-list" <ceph-tech-list@xxxxxxxxxx>, > "dev" <dev@xxxxxxx>, "Matt Benjamin" <mbenjamin@xxxxxxxxxx>, "Kaleb > Keithley" <kkeithle@xxxxxxxxxx>, "Orit Wasserman" <owasserm@xxxxxxxxxx>, > "Travis Nielsen" <tnielsen@xxxxxxxxxx> > > Sent: Tuesday, February 9, 2021 10:11:47 PM > > Subject: Re: Running different rgw daemon with same cephxuser > > > > Thank Matt, I just sent this to kick in the discussion > > https://github.com/ceph/ceph/pull/39380 > > If someone wants to take over it's preferable I guess, this is mainly > > due to my limited C++ knowledge. > > > > So feel free to assign someone from your team to take over so we can > > move faster with this one. > > Thanks! > > ––––––––– > > Sébastien Han > > Senior Principal Software Engineer, Storage Architect > > > > "Always give 100%. Unless you're giving blood." > > > > On Mon, Feb 8, 2021 at 3:53 PM Matt Benjamin <mbenjami@xxxxxxxxxx> > wrote: > > > > > > HI Sebastien, > > > > > > That seems like a concise and reasonable solution to me. It seems > > > like the metrics from a single instance should in fact be transient > > > (leaving the problem of maintaining aggregate values to prometheus or > > > even downstream of that? > > > > > > Matt > > > > > > On Mon, Feb 8, 2021 at 9:47 AM Sebastien Han <shan@xxxxxxxxxx> wrote: > > > > > > > > Hi Jiffin, > > > > > > > > From my perspective, one simple way to fix this (although we must be > > > > careful with backward compatibility) would be for rgw to register to > > > > service map differently. > > > > Today it is using the daemon name like rgw.foo, then it will register > > > > as foo. Essentially, if you try to run that pod twice you would still > > > > see a single instance in the service map as well as the prometheus > > > > metrics. > > > > > > > > It would be nice to register with RADOS client session ID instead , > > > > just like rbd-mirror does by using instance_id. Something like: > > > > > > > > std::string instance_id = stringify(rados->get_instance_id()); > > > > int ret = rados.service_daemon_register(daemon_type, name, metadata); > > > > > > > > Here > https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L1139 > > > > With that we can re-use the same cephx user and scale to any number, > > > > all instances will use the same cephx to authenticate to the cluster > > > > but they will show up as N in the service map. > > > > > > > > I guess one downside is that as soon as the daemon restart, we get a > > > > new RADOS client session ID, and thus our name changes, which means > we > > > > are losing all the metrics... > > > > Thoughts? > > > > > > > > Thanks! > > > > ––––––––– > > > > Sébastien Han > > > > Senior Principal Software Engineer, Storage Architect > > > > > > > > "Always give 100%. Unless you're giving blood." > > > > > > > > On Thu, Feb 4, 2021 at 3:39 PM Jiffin Thottan <jthottan@xxxxxxxxxx> > wrote: > > > > > > > > > > Hi all, > > > > > > > > > > In OCS(Rook) env workflow for RGW daemons as follows, > > > > > > > > > > Normally for creating ceph object-store, the first Rook creates > pools for rgw daemon with the specified configuration. > > > > > > > > > > Then depending on the no of instances, Rook create cephxuser and > then rgw spawn daemon in the container(pod) using its id > > > > > with following arguments for radosgw binary > > > > > Args: > > > > > --fsid=91501490-4b55-47db-b226-f9d9968774c1 > > > > > --keyring=/etc/ceph/keyring-store/keyring > > > > > --log-to-stderr=true > > > > > --err-to-stderr=true > > > > > --mon-cluster-log-to-stderr=true > > > > > --log-stderr-prefix=debug > > > > > --default-log-to-file=false > > > > > --default-mon-cluster-log-to-file=false > > > > > --mon-host=$(ROOK_CEPH_MON_HOST) > > > > > --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) > > > > > --id=rgw.my.store.a > > > > > --setuser=ceph > > > > > --setgroup=ceph > > > > > --foreground > > > > > --rgw-frontends=beast port=8080 > > > > > --host=$(POD_NAME) > > > > > --rgw-mime-types-file=/etc/ceph/rgw/mime.types > > > > > --rgw-realm=my-store > > > > > --rgw-zonegroup=my-store > > > > > --rgw-zone=my-store > > > > > > > > > > And here cephxuser will be "client.rgw.my.store.a" and all the > pools for rgw will be created as my-store*. Normally if there is > > > > > a request for another instance in the config file for a > ceph-object-store config file[1] for rook, another user > "client.rgw.mystore.b" > > > > > will be created by rook and will consume the same pools. > > > > > > > > > > There is a feature in Kubernetes known as autoscale in which pods > can be automatically scaled based on specified metrics. If we apply that > > > > > feature for rgw pods, Kubernetes will automatically scale the rgw > pods(like a clone of the existing pod) with the same argument for "--id" > > > > > based on the metrics, but ceph cannot distinguish those as > different rgw daemons even though multiple pods of rgw are running > simultaneously. > > > > > In "ceph status" shows only one daemon rgw as well > > > > > > > > > > In vstart or ceph ansible(Ali help me to figure it out), I can > see for each rg > <https://www.google.com/maps/search/ansible(Ali+help+me+to+figure+it+out),+I+can+see+for+each+rg?entry=gmail&source=g>w > daemon a cephxuser is getting created as well > > > > > > > > > > Is this behaviour intended ? or am I hitting any corner case which > was never tested before? > > > > > > > > > > There is no point of autoscaling of rgw pod if it considered to > the same daemon, the s3 client will talk to only one of the pods and ceph > mgr > > > > > provides metrics can give incorrect data as well which can affect > the autoscale feature > > > > > > > > > > Also opened an issue in rook for the time being [2] > > > > > > > > > > [1] > https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/object-test.yaml > > > > > [2] https://github.com/rook/rook/issues/6943 > > > > > > > > > > Regards, > > > > > Jiffin > > > > > > > > > > > > > > > > > > -- > > > > > > Matt Benjamin > > > Red Hat, Inc. > > > 315 West Huron Street, Suite 140A > > > Ann Arbor, Michigan 48103 > > > > > > http://www.redhat.com/en/technologies/storage > > > > > > tel. 734-821-5101 > > > fax. 734-769-8938 > > > cel. 734-216-5309 > > > > > > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx