I saw this go by in the commit log: commit cc2200c5e60caecf7931e546f6522b2ba364227f Merge: f8d5807 12c083e Author: Sage Weil <sage@xxxxxxxxxx> Date: Thu Feb 11 08:44:35 2016 -0500 Merge pull request #7537 from ifed01/wip-no-promote-for-delete-fix osd: fix unnecessary object promotion when deleting from cache pool Reviewed-by: Sage Weil <sage@xxxxxxxxxx> Is there any chance that I was basically seeing with the same thing from the filesystem standpoint? Thanks Steve > On Feb 5, 2016, at 8:42 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Fri, Feb 5, 2016 at 6:39 AM, Stephen Lord <Steve.Lord@xxxxxxxxxxx> wrote: >> >> I looked at this system this morning, and the it actually finished what it was >> doing. The erasure coded pool still contains all the data and the cache >> pool has about a million zero sized objects: >> >> >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED OBJECTS >> 15090G 9001G 6080G 40.29 2127k >> POOLS: >> NAME ID CATEGORY USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE >> cache-data 21 - 0 0 7962G 1162258 1057k 22969 3220k >> cephfs-data 22 - 3964G 26.27 5308G 1014840 991k 891k 1143k >> >> Definitely seems like a bug since I removed all references to these from the filesystem >> which created them. >> >> I originally wrote 4.5 Tbytes of data into the file system, the erasure coded >> pool is setup as 4+2, and the cache has a size limit of 1 Tbyte. Looks like not >> all the data made it out of the cache tier before I removed content, it removed the >> content which was only present in the cache tier and created a zero sized object >> in the cache for all the content. The used capacity is somewhat consistent with >> this. >> >> I tried to look at the extended attributes on one of the zero size object with ceph-dencoder, >> but it failed: >> >> error: buffer::malformed_input: void object_info_t::decode(ceph::buffer::list::iterator&) unknown encoding version > 15 >> >> Same error on one of the objects in the erasure coded pool. >> >> Looks like I am a little too bleeding edge for this, or the contents of the .ceph_ attribute are not an object_info_t > > ghobject_info_t > > You can get the EC stuff actually deleted by getting the cache pool to > flush everything. That's discussed in the docs and in various mailing > list archives. > -Greg > >> >> >> >> Steve >> >>> On Feb 4, 2016, at 7:10 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >>> >>> On Thu, Feb 4, 2016 at 5:07 PM, Stephen Lord <Steve.Lord@xxxxxxxxxxx> wrote: >>>> >>>>> On Feb 4, 2016, at 6:51 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >>>>> >>>>> I presume we're doing reads in order to gather some object metadata >>>>> from the cephfs-data pool; and the (small) newly-created objects in >>>>> cache-data are definitely whiteout objects indicating the object no >>>>> longer exists logically. >>>>> >>>>> What kinds of reads are you actually seeing? Does it appear to be >>>>> transferring data, or merely doing a bunch of seeks? I thought we were >>>>> trying to avoid doing reads-to-delete, but perhaps the way we're >>>>> handling snapshots or something is invoking behavior that isn't >>>>> amicable to a full-FS delete. >>>>> >>>>> I presume you're trying to characterize the system's behavior, but of >>>>> course if you just want to empty it out entirely you're better off >>>>> deleting the pools and the CephFS instance entirely and then starting >>>>> it over again from scratch. >>>>> -Greg >>>> >>>> I believe it is reading all the data, just from the volume of traffic and >>>> the cpu load on the OSDs maybe suggests it is doing more than >>>> just that. >>>> >>>> iostat is showing a lot of data moving, I am seeing about the same volume >>>> of read and write activity here. Because the OSDs underneath both pools >>>> are the same ones, I know that’s not exactly optimal, it is hard to tell what >>>> which pool is responsible for which I/O. Large reads and small writes suggest >>>> it is reading up all the data from the objects, the write traffic is I presume all >>>> journal activity relating to deleting objects and creating the empty ones. >>>> >>>> The 9:1 ratio between things being deleted and created seems odd though. >>>> >>>> A previous version of this exercise with just a regular replicated data pool >>>> did not read anything, just a lot of write activity and eventually the content >>>> disappeared. So definitely related to the pool configuration here and probably >>>> not to the filesystem layer. >>> >>> Sam, does this make any sense to you in terms of how RADOS handles deletes? >>> -Greg >> >> >> ---------------------------------------------------------------------- >> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com