Casey, Thanks. This all worked. Some observations and comments for others that may be in my situation: 1. When deleting the roles on the secondary with radosgw-admin role delete .... I had to delete all the policies of each role before I deleted the role itself.2. radosgw-admin complained when I tried and told me to use --yes-i-really-mean-it as otherwise it would result in inconsistent metadata between clusters (which is what I wanted in this case). Note: This was on my test cluster, I'll do the production cluster once some downtime is scheduled, as my users' stuff will complain/die when, at least for a short time, there are no roles on the secondary. Another note: I have no idea how ths happened, but my test cluster was still using ldb for the mon data! I updated a single mon on the master side and the mon would not start! After realizing I had ldb files in /var/lib/ceph/mon/ceph-servername/store.db, I extracted the monmap from a still-up mon, removed the broken mon from the map, re-inserted the monmap to the working mon, and then reinitialized the broken mon. Again, the mon would not start! What I determined was that ALL my mons on the test cluster were still running ldb (on pacific). Yuk. How I finally fixed it was to extract the monmap from a working mon (it was still using ldb), remove the broken one and reinsert the map to the working mon, but then before reinitializing the broken mon, create a file /var/lib/ceph/mon/ceph-<server>/kv_backend with the one line 'rocksdb', then and only then restore the mon. Repeat for all mons, using the first restored mon (which was now rocksdb) as the starting point so I didnt have to create the kv_backend file. Again, I have no idea how I got into this situation, but this cluster started at nautilus. It might make sense for ceph to provide an automated conversion script to recover a mon in this situation. For those of you wondering how to do what I vaguely am discussing ( manually remove and restore a broken mon), see here: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/ -Chris -----Original Message----- From: Casey Bodley <cbodley@xxxxxxxxxx> To: Christopher Durham <caduceus42@xxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Sent: Tue, Apr 11, 2023 1:59 pm Subject: Re: ceph 17.2.6 and iam roles (pr#48030) On Tue, Apr 11, 2023 at 3:53 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: > > On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham <caduceus42@xxxxxxx> wrote: > > > > > > Hi, > > I see that this PR: https://github.com/ceph/ceph/pull/48030 > > made it into ceph 17.2.6, as per the change log at: https://docs.ceph.com/en/latest/releases/quincy/ ; That's great. > > But my scenario is as follows: > > I have two clusters set up as multisite. Because of the lack of replication for IAM roles, we have set things up so that roles on the primary 'manually' get replicated to the secondary site via a python script. Thus, if I create a role on the primary, add/delete users or buckets from said role, the role, including the AssumeRolePolicyDocument and policies, gets pushed to the replicated site. This has served us well for three years. > > With the advent of this fix, what should I do before I upgrade to 17.2.6 (currently on 17.2.5, rocky 8) > > > > I know that in my situation, roles of the same name have different RoleIDs on the two sites. What should I do before I upgrade? Possibilities that *could* happen if i dont rectify things as we upgrade: > > 1. The different RoleIDs lead to two roles of the same name on the replicated site, perhaps with the system unable to address/look at/modify either > > 2. Roles just don't get repiicated to the second site > > no replication would happen until the metadata changes again on the > primary zone. once that gets triggered, the role metadata would > probably fail to sync due to the name conflicts > > > > > or other similar situations, all of which I want to avoid. > > Perhaps the safest thing to do is to remove all roles on the secondary site, upgrade, and then force a replication of roles (How would I *force* that for iAM roles if it is the correct answer?) > > this removal will probably be necessary to avoid those conflicts. once > that's done, you can force a metadata full sync on the secondary zone > by running 'radosgw-admin metadata sync init' there, then restarting > its gateways. this will have to resync all of the bucket and user > metadata as well p.s. don't use the DeleteRole rest api on the secondary zone after upgrading, as the request would get forwarded to the primary zone and delete it there too. you can use 'radosgw-admin role delete' on the secondary instead > > > Here is the original bug report: > > > > https://tracker.ceph.com/issues/57364 > > Thanks! > > -Chris > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx