Hi Ceph-Users, I have a multisite Ceph cluster deployed on containers within 3 VMs (6 VMs total over 2 sites). Each VM has a mon, osd, mgr, mds, and two rgw containers (regular and pubsub). It was installed with ceph-ansible. One of the sites has been up for a few years, the other site has been recently re-installed and paired with the initial site. The initial site is using Nautlius (14.2.9), the new site is on Octopus (15.2.13). (Side point - is this valid?) I've noticed that on the new site, pubsub is building a gigantic queue of objects (it's building faster than our product can acknowledge the events). I'm having a rough time trying to debug this/understand why the queue is building. I currently have 450k objects stored in an S3 bucket, that is mostly inactive (our test system backed by this cluster is off while we attempt to resolve this), synced between the two sites. The pubsub queue on the second site currently has 1.7M objects, and I've disabled the pubsub containers to prevent it building further. As soon as I enable the pubsub containers again this starts building at an alarming rate. What I've tried: * Interacting with the pubsub REST API. I pulled all the events in the pubsub queue and did some analysis on them. * Of the 1.7M events, there were 106k unique S3 objects referenced. * The average S3 object had 13 pubsub events referring to it. This seems very odd given the inactivity of the data, I was expecting to find no duplicate entries here. * The most mentioned S3 object was referred to 362 times (i.e. a single S3 object had 362 pubsub OBJECT_CREATE events). * All the mTimes are from 2020 (other than 35 in 2021) - the second site was only deployed this month. Does anyone have any suggestions as to why this is occurring? Thanks, Alex _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx