Re: rados bench leaves objects in tiered pool

Дмитрий Глушенок <glush@xxxxxxxxxx> · Tue, 3 Nov 2015 22:19:04 +0300

Hi,

Thanks Gregory and Robert, now it is a bit clearer.

After cache-flush-evict-all almost all objects were deleted, but 101 remained in cache pool. Also 1 pg changed its state to inconsistent with HEALTH_ERR. 
"ceph pg repair" changed objects count to 100, but at least ceph become healthy.

Now it looks like:
POOLS:
    NAME                      ID     USED      %USED     MAX AVAIL     OBJECTS 
    rbd-cache                 36     23185         0          157G         100 
    rbd                       37         0         0          279G           0 
# rados -p rbd-cache ls -all
# rados -p rbd ls -all
# 

Is there any way to find what the objects are?

"ceph pg ls-by-pool rbd-cache" gives me pgs of the objects. Looking into these pgs gives me nothing I can understand :)

# ceph pg ls-by-pool rbd-cache | head -4
pg_stat objects mip     degr    misp    unf     bytes   log     disklog state   state_stamp     v       reported        up   up_primary       acting  acting_primary  last_scrub      scrub_stamp     last_deep_scrub deep_scrub_stamp
36.0    1       0       0       0       0       83      926     926     active+clean    2015-11-03 22:06:39.193371      798'926       798:640 [4,0,3] 4       [4,0,3] 4       798'926 2015-11-03 22:06:39.193321      798'926 2015-11-03 22:06:39.193321
36.1    1       0       0       0       0       193     854     854     active+clean    2015-11-03 18:28:51.190819      798'854       798:515 [1,4,3] 1       [1,4,3] 1       796'628 2015-11-03 18:28:51.190749      0'0     2015-11-02 18:28:42.546224
36.2    1       0       0       0       0       198     869     869     active+clean    2015-11-03 18:28:44.556048      798'869       798:554 [2,0,1] 2       [2,0,1] 2       796'650 2015-11-03 18:28:44.555980      0'0     2015-11-02 18:28:42.546226
#

# find /var/lib/ceph/osd/ceph-0/current/36.0_head/
/var/lib/ceph/osd/ceph-0/current/36.0_head/
/var/lib/ceph/osd/ceph-0/current/36.0_head/__head_00000000__24
/var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 11:12:37.962360\u2015-11-03 21:28:58.149662__head_00000000_.ceph-internal_24
# find /var/lib/ceph/osd/ceph-0/current/36.2_head/
/var/lib/ceph/osd/ceph-0/current/36.2_head/
/var/lib/ceph/osd/ceph-0/current/36.2_head/__head_00000002__24
/var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 19:50:00.788736\u2015-11-03 21:29:02.460568__head_00000002_.ceph-internal_24
#

# ls -l /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\\uset\\u36.0\\uarchive\\u2015-11-03\ 11\:12\:37.962360\\u2015-11-03\ 21\:28\:58.149662__head_00000000_.ceph-internal_24 
-rw-r--r--. 1 root root 83 Nov  3 21:28 /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 11:12:37.962360\u2015-11-03 21:28:58.149662__head_00000000_.ceph-internal_24
# 
# ls -l /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\\uset\\u36.2\\uarchive\\u2015-11-02\ 19\:50\:00.788736\\u2015-11-03\ 21\:29\:02.460568__head_00000002_.ceph-internal_24 
-rw-r--r--. 1 root root 198 Nov  3 21:29 /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 19:50:00.788736\u2015-11-03 21:29:02.460568__head_00000002_.ceph-internal_24
#

--
Dmitry Glushenok
Jet Infosystems

> 3 нояб. 2015 г., в 20:11, Robert LeBlanc <robert@xxxxxxxxxxxxx> написал(а):
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Try:
> 
> rados -p {cachepool} cache-flush-evict-all
> 
> and see if the objects clean up.
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Tue, Nov 3, 2015 at 8:02 AM, Gregory Farnum  wrote:
>> When you have a caching pool in writeback mode, updates to objects
>> (including deletes) are handled by writeback rather than writethrough.
>> Since there's no other activity against these pools, there is nothing
>> prompting the cache pool to flush updates out to the backing pool, so
>> the backing pool hasn't deleted its objects because nothing's told it
>> to. You'll find that the cache pool has deleted the data for its
>> objects, but it's keeping around a small "whiteout" and the object
>> info metadata.
>> The "rados ls" you're using has never played nicely with cache tiering
>> and probably never will. :( Listings are expensive operations and
>> modifying them to do more than the simple info scan would be fairly
>> expensive in terms of computation and IO.
>> 
>> I think there are some caching commands you can send to flush updates
>> which would cause the objects to be entirely deleted, but I don't have
>> them off-hand. You can probably search the mailing list archives or
>> the docs for tiering commands. :)
>> -Greg
>> 
>> On Tue, Nov 3, 2015 at 12:40 AM, Дмитрий Глушенок  wrote:
>>> Hi,
>>> 
>>> While benchmarking tiered pool using rados bench it was noticed that objects are not being removed after test.
>>> 
>>> Test was performed using "rados -p rbd bench 3600 write". The pool is not used by anything else.
>>> 
>>> Just before end of test:
>>> POOLS:
>>>    NAME                      ID     USED       %USED     MAX AVAIL     OBJECTS
>>>    rbd-cache                 36     33110M      3.41          114G        8366
>>>    rbd                       37     43472M      4.47          237G       10858
>>> 
>>> Some time later (few hundreds of writes are flushed, rados automatic cleanup finished):
>>> POOLS:
>>>    NAME                      ID     USED       %USED     MAX AVAIL     OBJECTS
>>>    rbd-cache                 36      22998         0          157G       16342
>>>    rbd                       37     46050M      4.74          234G       11503
>>> 
>>> # rados -p rbd-cache ls | wc -l
>>> 16242
>>> # rados -p rbd ls | wc -l
>>> 11503
>>> #
>>> 
>>> # rados -p rbd cleanup
>>> error during cleanup: -2
>>> error 2: (2) No such file or directory
>>> #
>>> 
>>> # rados -p rbd cleanup --run-name "" --prefix prefix ""
>>> Warning: using slow linear search
>>> Removed 0 objects
>>> #
>>> 
>>> # rados -p rbd ls | head -5
>>> benchmark_data_dropbox01.tzk_7641_object10901
>>> benchmark_data_dropbox01.tzk_7641_object9645
>>> benchmark_data_dropbox01.tzk_7641_object10389
>>> benchmark_data_dropbox01.tzk_7641_object10090
>>> benchmark_data_dropbox01.tzk_7641_object11204
>>> #
>>> 
>>> #  rados -p rbd-cache ls | head -5
>>> benchmark_data_dropbox01.tzk_7641_object10901
>>> benchmark_data_dropbox01.tzk_7641_object9645
>>> benchmark_data_dropbox01.tzk_7641_object10389
>>> benchmark_data_dropbox01.tzk_7641_object5391
>>> benchmark_data_dropbox01.tzk_7641_object10090
>>> #
>>> 
>>> So, it looks like the objects are still in place (in both pools?). But it is not possible to remove them:
>>> 
>>> # rados -p rbd rm benchmark_data_dropbox01.tzk_7641_object10901
>>> error removing rbd>benchmark_data_dropbox01.tzk_7641_object10901: (2) No such file or directory
>>> #
>>> 
>>> # ceph health
>>> HEALTH_OK
>>> #
>>> 
>>> 
>>> Can somebody explain the behavior? And is it possible to cleanup the benchmark data without recreating the pools?
>>> 
>>> 
>>> ceph version 0.94.5
>>> 
>>> # ceph osd dump | grep rbd
>>> pool 36 'rbd-cache' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 100 pgp_num 100 last_change 755 flags hashpspool,incomplete_clones tier_of 37 cache_mode writeback target_bytes 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0
>>> pool 37 'rbd' erasure size 5 min_size 3 crush_ruleset 2 object_hash rjenkins pg_num 100 pgp_num 100 last_change 745 lfor 745 flags hashpspool tiers 36 read_tier 36 write_tier 36 stripe_width 4128
>>> #
>>> 
>>> # ceph osd pool get rbd-cache hit_set_type
>>> hit_set_type: bloom
>>> # ceph osd pool get rbd-cache hit_set_period
>>> hit_set_period: 3600
>>> # ceph osd pool get rbd-cache hit_set_count
>>> hit_set_count: 1
>>> # ceph osd pool get rbd-cache target_max_objects
>>> target_max_objects: 0
>>> # ceph osd pool get rbd-cache target_max_bytes
>>> target_max_bytes: 107374182400
>>> # ceph osd pool get rbd-cache cache_target_dirty_ratio
>>> cache_target_dirty_ratio: 0.1
>>> # ceph osd pool get rbd-cache cache_target_full_ratio
>>> cache_target_full_ratio: 0.2
>>> #
>>> 
>>> Crush map:
>>> root cache_tier {
>>>        id -7           # do not change unnecessarily
>>>        # weight 0.450
>>>        alg straw
>>>        hash 0  # rjenkins1
>>>        item osd.0 weight 0.090
>>>        item osd.1 weight 0.090
>>>        item osd.2 weight 0.090
>>>        item osd.3 weight 0.090
>>>        item osd.4 weight 0.090
>>> }
>>> root store_tier {
>>>        id -8           # do not change unnecessarily
>>>        # weight 0.450
>>>        alg straw
>>>        hash 0  # rjenkins1
>>>        item osd.5 weight 0.090
>>>        item osd.6 weight 0.090
>>>        item osd.7 weight 0.090
>>>        item osd.8 weight 0.090
>>>        item osd.9 weight 0.090
>>> }
>>> rule cache {
>>>        ruleset 1
>>>        type replicated
>>>        min_size 0
>>>        max_size 5
>>>        step take cache_tier
>>>        step chooseleaf firstn 0 type osd
>>>        step emit
>>> }
>>> rule store {
>>>        ruleset 2
>>>        type erasure
>>>        min_size 0
>>>        max_size 5
>>>        step take store_tier
>>>        step chooseleaf firstn 0 type osd
>>>        step emit
>>> }
>>> 
>>> Thanks
>>> 
>>> --
>>> Dmitry Glushenok
>>> Jet Infosystems
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWOOqoCRDmVDuy+mK58QAAScEP/jdBEK0lH2u38S7hOwll
> FA2J5+9lY0QAzxyTaRIzsZu0g+LSrLek39fFMTX/zHI0TMfSTM6dnPs6ucO2
> X59wbIyJQPEzzWEa2qn4F0j/QmWkMGfuMKDjxyT6OudpZtQKDS8mt13rnqAc
> QH+0bQaVfjbKGowCAHyJXTsPK4qgew7dV3JDW2hVX/vIDjYAJomkF8Ll4miT
> IQ6ViV2+9u4uW99ty4aSRnYUwaf7vqycK+qUT0Uohi6iTeym7s78O42Qa0p2
> WYHcXzAdYnBiR+qTIVWvKXn81tm4gmP4lSh0gpRoJ007c0hu5vTAnUvHRh0Z
> 070NTrmAAJXN0oZ7lkoksYZVkXDJkwBZpdif69OQU3No/HhcY9JtagEMCXcc
> 7/bUACaKjyKRzRmT3VNPQuMI0ix+tdi3PU4dL+16eBO832BNsqnyNHPxu570
> su1m4UQVGmoXCTUOeXYe9j4jzlHO/QRXcp/soFW5DgYO6JmylZzbyNmHPjMx
> CiOsdhnjAylG/zq42S4zTfd+F//aRODGJ0JNHmQYm7M2sezAvQD1HyBCAwds
> VfyOcfZwyeUNubtqssmOQ+n8/EDQciK4RH/hxG8bC8xsZaNgum7Z4/zA+Efc
> gMuplOsBECODAoBlfA2TP3/XixzTXoVGMmdXolOhs+Z+tT+O22eKUEK7GbMG
> rIWX
> =qbDR
> -----END PGP SIGNATURE-----

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com