Re: Shadow files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Ben" <b@benjackson.email>
> To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
> Cc: "Craig Lewis" <clewis@xxxxxxxxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxx>
> Sent: Tuesday, March 17, 2015 7:28:28 PM
> Subject: Re:  Shadow files
> 
> None of this helps with trying to remove defunct shadow files which
> number in the 10s of millions.

Did it at least reflect that the garbage collection system works?

> 
> Is there a quick way to see which shadow files are safe to delete
> easily?

There's no easy process. If you know that a lot of the removed data is on buckets that shouldn't exist anymore then you could start by trying to identify that. You could do that by:

$ radosgw-admin metadata list bucket

then, for each bucket:

$ radosgw-admin metadata get bucket:<bucket name>

This will give you the bucket markers of all existing buckets. Each data object (head and shadow objects) is prefixed by bucket markers. Objects that don't have valid bucket markers can be removed. Note that I would first list all objects, then get the list of valid bucket markers, as the operation is racy and new buckets can be created in the mean time.

We did discuss a new garbage cleanup tool that will address your specific issue, and we have a design for it, but it's not there yet.

Yehuda



> Remembering that there are MILLIONS of objects.
> 
> We have a 320TB cluster which is 272TB full. Of this, we should only
> actually be seeing 190TB. There is 80TB of shadow files that should no
> longer exist.
> 
> On 2015-03-18 02:00, Yehuda Sadeh-Weinraub wrote:
> > ----- Original Message -----
> >> From: "Ben" <b@benjackson.email>
> >> To: "Craig Lewis" <clewis@xxxxxxxxxxxxxxxxxx>
> >> Cc: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>, "ceph-users"
> >> <ceph-users@xxxxxxxx>
> >> Sent: Monday, March 16, 2015 3:38:42 PM
> >> Subject: Re:  Shadow files
> >> 
> >> Thats the thing. The peaks and troughs are in USERS BUCKETS only.
> >> The actual cluster usage does not go up and down, it just goes up up
> >> up.
> >> 
> >> I would expect to see peaks and troughs much the same as the user
> >> buckets peaks and troughs on the overall cluster disk usage.
> >> But this is not the case.
> >> 
> >> We upgraded the cluster and radosgws to GIANT (0.87.1) yesterday, and
> >> now we are seeing a large number of misplaced(??) objects being moved
> >> around.
> >> Does this mean it has found all the shadow files that shouldn't exist
> >> anymore, and is deleting them? If so I would expect to start seeing
> >> overall cluster usage drop, but this hasn't happened yet.
> > 
> > No, I don't think so. Sounds like your cluster is recovering, and it
> > happens in a completely different layer.
> >> 
> >> Any ideas?
> > 
> > try running:
> > $ radosgw-admin gc list --include-all
> > 
> > This should be showing all the shadow objects that are pending for
> > delete. Note that if you have a non-default radosgw configuration,
> > make sure you run radosgw-admin using the same user and config that
> > radosgw is running (e.g., add -n client.<user> appropriately),
> > otherwise it might not look at the correct zone data.
> > You could create an object, identify the shadow objects for that
> > object, remove it, check to see that the gc list command shows these
> > shadow objects. Then, wait the configured time (2 hours?), and see if
> > it was removed.
> > 
> > Yehuda
> > 
> > 
> >> 
> >> On 2015-03-17 06:12, Craig Lewis wrote:
> >> > Out of curiousity, what's the frequency of the peaks and troughs?
> >> >
> >> > RadosGW has configs on how long it should wait after deleting before
> >> > garbage collecting, how long between GC runs, and how many objects it
> >> > can GC in per run.
> >> >
> >> > The defaults are 2 hours, 1 hour, and 32 respectively.  Search
> >> > http://docs.ceph.com/docs/master/radosgw/config-ref/ [2] for "rgw gc".
> >> >
> >> > If your peaks and troughs have a frequency less than 1 hour, then GC
> >> > is going to delay and alias the disk usage w.r.t. the object count.
> >> >
> >> > If you have millions of objects, you probably need to tweak those
> >> > values.  If RGW is only GCing 32 objects an hour, it's never going to
> >> > catch up.
> >> >
> >> > Now that I think about it, I bet I'm having issues here too.  I delete
> >> > more than (32*24) objects per day...
> >> >
> >> > On Sun, Mar 15, 2015 at 4:41 PM, Ben <b@benjackson.email> wrote:
> >> >
> >> >> It is either a problem with CEPH, Civetweb or something else in our
> >> >> configuration.
> >> >> But deletes in user buckets is still leaving a high number of old
> >> >> shadow files. Since we have millions and millions of objects, it is
> >> >> hard to reconcile what should and shouldnt exist.
> >> >>
> >> >> Looking at our cluster usage, there are no troughs, it is just a
> >> >> rising peak.
> >> >> But when looking at users data usage, we can see peaks and troughs
> >> >> as you would expect as data is deleted and added.
> >> >>
> >> >> Our ceph version 0.80.9
> >> >>
> >> >> Please ideas?
> >> >>
> >> >> On 2015-03-13 02:25, Yehuda Sadeh-Weinraub wrote:
> >> >>
> >> >> ----- Original Message -----
> >> >> From: "Ben" <b@benjackson.email>
> >> >> To: ceph-users@xxxxxxxx
> >> >> Sent: Wednesday, March 11, 2015 8:46:25 PM
> >> >> Subject: Re:  Shadow files
> >> >>
> >> >> Anyone got any info on this?
> >> >>
> >> >> Is it safe to delete shadow files?
> >> >>
> >> >> It depends. Shadow files are badly named objects that represent
> >> >> part
> >> >> of the objects data. They are only safe to remove if you know that
> >> >> the
> >> >> corresponding objects no longer exist.
> >> >>
> >> >> Yehuda
> >> >>
> >> >> On 2015-03-11 10:03, Ben wrote:
> >> >>> We have a large number of shadow files in our cluster that aren't
> >> >>> being deleted automatically as data is deleted.
> >> >>>
> >> >>> Is it safe to delete these files?
> >> >>> Is there something we need to be aware of when deleting them?
> >> >>> Is there a script that we can run that will delete these safely?
> >> >>>
> >> >>> Is there something wrong with our cluster that it isn't deleting
> >> >> these
> >> >>> files when it should be?
> >> >>>
> >> >>> We are using civetweb with radosgw, with tengine ssl proxy
> >> >> infront of
> >> >>> it
> >> >>>
> >> >>> Any advice please
> >> >>> Thanks
> >> >> _______________________________________________
> >> >> ceph-users mailing list
> >> >> ceph-users@xxxxxxxxxxxxxx
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1]
> >> >  _______________________________________________
> >> >  ceph-users mailing list
> >> >  ceph-users@xxxxxxxxxxxxxx
> >> >  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1]
> >> >
> >> >
> >> >
> >> > Links:
> >> > ------
> >> > [1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> > [2] http://docs.ceph.com/docs/master/radosgw/config-ref/
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux