Hi Jiffin, >From my perspective, one simple way to fix this (although we must be careful with backward compatibility) would be for rgw to register to service map differently. Today it is using the daemon name like rgw.foo, then it will register as foo. Essentially, if you try to run that pod twice you would still see a single instance in the service map as well as the prometheus metrics. It would be nice to register with RADOS client session ID instead , just like rbd-mirror does by using instance_id. Something like: std::string instance_id = stringify(rados->get_instance_id()); int ret = rados.service_daemon_register(daemon_type, name, metadata); Here https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L1139 With that we can re-use the same cephx user and scale to any number, all instances will use the same cephx to authenticate to the cluster but they will show up as N in the service map. I guess one downside is that as soon as the daemon restart, we get a new RADOS client session ID, and thus our name changes, which means we are losing all the metrics... Thoughts? Thanks! ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Thu, Feb 4, 2021 at 3:39 PM Jiffin Thottan <jthottan@xxxxxxxxxx> wrote: > > Hi all, > > In OCS(Rook) env workflow for RGW daemons as follows, > > Normally for creating ceph object-store, the first Rook creates pools for rgw daemon with the specified configuration. > > Then depending on the no of instances, Rook create cephxuser and then rgw spawn daemon in the container(pod) using its id > with following arguments for radosgw binary > Args: > --fsid=91501490-4b55-47db-b226-f9d9968774c1 > --keyring=/etc/ceph/keyring-store/keyring > --log-to-stderr=true > --err-to-stderr=true > --mon-cluster-log-to-stderr=true > --log-stderr-prefix=debug > --default-log-to-file=false > --default-mon-cluster-log-to-file=false > --mon-host=$(ROOK_CEPH_MON_HOST) > --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) > --id=rgw.my.store.a > --setuser=ceph > --setgroup=ceph > --foreground > --rgw-frontends=beast port=8080 > --host=$(POD_NAME) > --rgw-mime-types-file=/etc/ceph/rgw/mime.types > --rgw-realm=my-store > --rgw-zonegroup=my-store > --rgw-zone=my-store > > And here cephxuser will be "client.rgw.my.store.a" and all the pools for rgw will be created as my-store*. Normally if there is > a request for another instance in the config file for a ceph-object-store config file[1] for rook, another user "client.rgw.mystore.b" > will be created by rook and will consume the same pools. > > There is a feature in Kubernetes known as autoscale in which pods can be automatically scaled based on specified metrics. If we apply that > feature for rgw pods, Kubernetes will automatically scale the rgw pods(like a clone of the existing pod) with the same argument for "--id" > based on the metrics, but ceph cannot distinguish those as different rgw daemons even though multiple pods of rgw are running simultaneously. > In "ceph status" shows only one daemon rgw as well > > In vstart or ceph ansible(Ali help me to figure it out), I can see for each rgw daemon a cephxuser is getting created as well > > Is this behaviour intended ? or am I hitting any corner case which was never tested before? > > There is no point of autoscaling of rgw pod if it considered to the same daemon, the s3 client will talk to only one of the pods and ceph mgr > provides metrics can give incorrect data as well which can affect the autoscale feature > > Also opened an issue in rook for the time being [2] > > [1] https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/object-test.yaml > [2] https://github.com/rook/rook/issues/6943 > > Regards, > Jiffin > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx