Re: why is there heavy read traffic during object delete?

Stephen Lord <Steve.Lord@xxxxxxxxxxx> · Fri, 5 Feb 2016 14:39:01 +0000

I looked at this system this morning, and the it actually finished what it was
doing. The erasure coded pool still contains all the data and the cache
pool has about a million zero sized objects:

GLOBAL:
    SIZE       AVAIL     RAW USED     %RAW USED     OBJECTS 
    15090G     9001G        6080G         40.29       2127k 
POOLS:
    NAME                ID     CATEGORY     USED       %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE 
    cache-data          21     -                 0         0         7962G     1162258     1057k      22969     3220k 
    cephfs-data         22     -             3964G     26.27         5308G     1014840      991k       891k     1143k 

Definitely seems like a bug since I removed all references to these from the filesystem
which created them.

I originally wrote 4.5 Tbytes of data into the file system, the erasure coded
pool is setup as 4+2, and the cache has a size limit of 1 Tbyte. Looks like not
all the data made it out of the cache tier before I removed content, it removed the
content which was only present in the cache tier and created a zero sized object
in the cache for all the content. The used capacity is somewhat consistent with
this.

I tried to look at the extended attributes on one of the zero size object with ceph-dencoder,
but it failed:

error: buffer::malformed_input: void object_info_t::decode(ceph::buffer::list::iterator&) unknown encoding version > 15

Same error on one of the objects in the erasure coded pool.

Looks like I am a little too bleeding edge for this, or the contents of the .ceph_ attribute are not an object_info_t

Steve

> On Feb 4, 2016, at 7:10 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> 
> On Thu, Feb 4, 2016 at 5:07 PM, Stephen Lord <Steve.Lord@xxxxxxxxxxx> wrote:
>> 
>>> On Feb 4, 2016, at 6:51 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>>> 
>>> I presume we're doing reads in order to gather some object metadata
>>> from the cephfs-data pool; and the (small) newly-created objects in
>>> cache-data are definitely whiteout objects indicating the object no
>>> longer exists logically.
>>> 
>>> What kinds of reads are you actually seeing? Does it appear to be
>>> transferring data, or merely doing a bunch of seeks? I thought we were
>>> trying to avoid doing reads-to-delete, but perhaps the way we're
>>> handling snapshots or something is invoking behavior that isn't
>>> amicable to a full-FS delete.
>>> 
>>> I presume you're trying to characterize the system's behavior, but of
>>> course if you just want to empty it out entirely you're better off
>>> deleting the pools and the CephFS instance entirely and then starting
>>> it over again from scratch.
>>> -Greg
>> 
>> I believe it is reading all the data, just from the volume of traffic and
>> the cpu load on the OSDs maybe suggests it is doing more than
>> just that.
>> 
>> iostat is showing a lot of data moving, I am seeing about the same volume
>> of read and write activity here. Because the OSDs underneath both pools
>> are the same ones, I know that’s not exactly optimal, it is hard to tell what
>> which pool is responsible for which I/O. Large reads and small writes suggest
>> it is reading up all the data from the objects,  the write traffic is I presume all
>> journal activity relating to deleting objects and creating the empty ones.
>> 
>> The 9:1 ratio between things being deleted and created seems odd though.
>> 
>> A previous version of this exercise with just a regular replicated data pool
>> did not read anything, just a lot of write activity and eventually the content
>> disappeared. So definitely related to the pool configuration here and probably
>> not to the filesystem layer.
> 
> Sam, does this make any sense to you in terms of how RADOS handles deletes?
> -Greg

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com