Thanks again Yehuda for the quick response. At the moment we only have a single bucket in which all objects are being kept. That single bucket is only holding approximately 22 objects, so reconciling between the master and slave was not so hard. I have a definitive list of the "shadow" objects that need to be removed. Thanks again for your help, much appreciated. MLM -----Original Message----- From: yehuda@xxxxxxxxxxx [mailto:yehuda at inktank.com] On Behalf Of Yehuda Sadeh Sent: Friday, September 26, 2014 10:55 AM To: lyn_mitchell at bellsouth.net Cc: ceph-users Subject: Re: Any way to remove possible orphaned files in a federated gateway configuration On Fri, Sep 26, 2014 at 8:28 AM, Lyn Mitchell <mitch95 at bellsouth.net> wrote: > Hi all, > Is there a particular sequence or standard procedure when removing objects from a .rgw.buckets pool? I have some orphaned objects that are taking up a lot of disk space and need to remove them as soon as possible. Is there a procedure that is recursive in nature, in that it removes all or most references to the object to be deleted? Should "radosgw-admin object rm" or "rados -p <pool_name> rm <obj-name>" be used? For now you can use: $ rados -p <pool_name> rm <obj> One more thing to point is that you want to be absolutely sure that these objects aren't in use. Another thing that could happen is that these were copied to a different bucket, and they still exist on the other bucket. You'll need to verify that this is not the case. We do need to create some better tooling around this. We can have some kind of automatic process that first marks all rados objects pointed at by rgw objects manifests, and then goes through all objects in the system and remove unmarked objects. We should add a flag to every newly create d object in the system that will specify when it was created (orphaned cleanup generation number). Also, perhaps a pointer to the containing object will be useful. Originally the 'shadow' objects (what a terrible name, originally it made sense but the architecture changed quite a bit since) were actually named after their containing object, but for some reason this was lost. I'll open the required ceph tracker issues. Yehuda > > Thanks in advance, > MLM > > -----Original Message----- > From: Lyn Mitchell [mailto:mitch95 at bellsouth.net] > Sent: Thursday, September 25, 2014 4:49 PM > To: 'ceph-users'; 'ceph-community at lists.ceph.com' > Subject: RE: Any way to remove possible orphaned files in > a federated gateway configuration > > Thanks Yehuda for your response, much appreciated. > > Using the "radosgw-admin object stat" option I was able to reconcile the objects on master and slave. There are 10 objects on the master that have replicated to the slave, for these 10 objects I was able to confirm by pulling the tag prefix from "object stat", verifying size, name, etc. There are still a large number of "shadow" files in .region-1.zone-2.rgw.buckets pool which have no corresponding object to cross reference using "object stat" command. These files are taking up several hundred GB from OSD's on the region-2 cluster. What would be the correct way to remove these "shadow" files that no longer have objects associated? Is there a process that will clean these orphaned objects? Any steps anyone can provide to remove these files would greatly appreciated. > > BTW - Since my original post several objects have been copied via s3 client to the master and everything appears to be replicating without issue. Objects have been deleted as well, the sync looks fine, objects are being removed from master and slave. I'm pretty sure the large number of orphaned "shadow" files that are currently in the .region-1.zone-2.rgw.buckets pool are from the original sync performed back on Sept. 15. > > Thanks in advance, > MLM > > -----Original Message----- > From: yehudasa at gmail.com [mailto:yehudasa at gmail.com] On Behalf Of > Yehuda Sadeh > Sent: Tuesday, September 23, 2014 5:30 PM > To: lyn_mitchell at bellsouth.net > Cc: ceph-users; ceph-community at lists.ceph.com > Subject: Re: Any way to remove possible orphaned files in > a federated gateway configuration > > On Tue, Sep 23, 2014 at 3:05 PM, Lyn Mitchell <mitch95 at bellsouth.net> wrote: >> Is anyone aware of a way to either reconcile or remove possible >> orphaned ?shadow? files in a federated gateway configuration? The >> issue we?re seeing is the number of chunks/shadow files on the slave has many more ?shadow? >> files than the master, the breakdown is as follows: >> >> master zone: >> >> .region-1.zone-1.rgw.buckets = 1737 ?shadow? files of which there are >> 10 distinct sets of tags, an example of 1 distinct set is: >> >> alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_1 through >> alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_516 >> >> >> >> slave zone: >> >> .region-1.zone-2.rgw.buckets = 331961 ?shadow? files, of which there >> are 652 distinct sets of tags, examples: >> >> 1 set having 516 ?shadow? files: >> >> alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_1 through >> alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_516 >> >> >> >> 236 sets having 515 ?shadow? files apiece: >> >> alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_1 through >> alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_515 >> >> alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_1 through >> alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_515 > > These are all part of the same bucket (prefixed by alph-1.80907.1). > >> >> ?. >> >> >> >> The number of shadow files in zone-2 is taking quite a bit of space from the >> OSD?s in the cluster. Without being able to trace back to the original >> file name from an s3 or rados tag, I have no way of knowing which >> files these are. Is it possible that the same file may have been >> replicated multiple times, due to network or connectivity issues? >> >> >> >> I can provide any logs or other information that may provide some >> help, however at this point we?re not seeing any real errors. >> >> >> >> Thanks in advance for any help that can be provided, > > You can also run the following command on the existing objects within that specific bucket: > > $ radosgw-admin object stat --bucket=<bucket> --object=<object> > > This will show the mapping from the rgw object to the rados objects that construct it. > > > Yehuda > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com