Re: rados bench leaves objects in tiered pool

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 3 Nov 2015 11:29:54 -0800

Ceph maintains some metadata in objects. In this case, hitsets, which keep track of object accesses for evaluating how hot an object is when flushing and evicting from the cache.

On Tuesday, November 3, 2015, Дмитрий Глушенок <glush@xxxxxxxxxx> wrote:
Hi,

Thanks Gregory and Robert, now it is a bit clearer.

After cache-flush-evict-all almost all objects were deleted, but 101 remained in cache pool. Also 1 pg changed its state to inconsistent with HEALTH_ERR.

"ceph pg repair" changed objects count to 100, but at least ceph become healthy.

Now it looks like:

POOLS:

    NAME                      ID     USED      %USED     MAX AVAIL     OBJECTS

    rbd-cache                 36     23185         0          157G         100

    rbd                       37         0         0          279G           0

# rados -p rbd-cache ls -all

# rados -p rbd ls -all

#

Is there any way to find what the objects are?

"ceph pg ls-by-pool rbd-cache" gives me pgs of the objects. Looking into these pgs gives me nothing I can understand :)

# ceph pg ls-by-pool rbd-cache | head -4

pg_stat objects mip     degr    misp    unf     bytes   log     disklog state   state_stamp     v       reported        up   up_primary       acting  acting_primary  last_scrub      scrub_stamp     last_deep_scrub deep_scrub_stamp

36.0    1       0       0       0       0       83      926     926     active+clean    2015-11-03 22:06:39.193371      798'926       798:640 [4,0,3] 4       [4,0,3] 4       798'926 2015-11-03 22:06:39.193321      798'926 2015-11-03 22:06:39.193321

36.1    1       0       0       0       0       193     854     854     active+clean    2015-11-03 18:28:51.190819      798'854       798:515 [1,4,3] 1       [1,4,3] 1       796'628 2015-11-03 18:28:51.190749      0'0     2015-11-02 18:28:42.546224

36.2    1       0       0       0       0       198     869     869     active+clean    2015-11-03 18:28:44.556048      798'869       798:554 [2,0,1] 2       [2,0,1] 2       796'650 2015-11-03 18:28:44.555980      0'0     2015-11-02 18:28:42.546226

#

# find /var/lib/ceph/osd/ceph-0/current/36.0_head/

/var/lib/ceph/osd/ceph-0/current/36.0_head/

/var/lib/ceph/osd/ceph-0/current/36.0_head/__head_00000000__24

/var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 11:12:37.962360\u2015-11-03 21:28:58.149662__head_00000000_.ceph-internal_24

# find /var/lib/ceph/osd/ceph-0/current/36.2_head/

/var/lib/ceph/osd/ceph-0/current/36.2_head/

/var/lib/ceph/osd/ceph-0/current/36.2_head/__head_00000002__24

/var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 19:50:00.788736\u2015-11-03 21:29:02.460568__head_00000002_.ceph-internal_24

#

# ls -l /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\\uset\\u36.0\\uarchive\\u2015-11-03\ 11\:12\:37.962360\\u2015-11-03\ 21\:28\:58.149662__head_00000000_.ceph-internal_24

-rw-r--r--. 1 root root 83 Nov  3 21:28 /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 11:12:37.962360\u2015-11-03 21:28:58.149662__head_00000000_.ceph-internal_24

#

# ls -l /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\\uset\\u36.2\\uarchive\\u2015-11-02\ 19\:50\:00.788736\\u2015-11-03\ 21\:29\:02.460568__head_00000002_.ceph-internal_24

-rw-r--r--. 1 root root 198 Nov  3 21:29 /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 19:50:00.788736\u2015-11-03 21:29:02.460568__head_00000002_.ceph-internal_24

#

--

Dmitry Glushenok

Jet Infosystems

> 3 нояб. 2015 г., в 20:11, Robert LeBlanc <robert@xxxxxxxxxxxxx> написал(а):

>

> -----BEGIN PGP SIGNED MESSAGE-----

> Hash: SHA256

>

> Try:

>

> rados -p {cachepool} cache-flush-evict-all

>

> and see if the objects clean up.

> - ----------------

> Robert LeBlanc

> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

>

>

> On Tue, Nov 3, 2015 at 8:02 AM, Gregory Farnum  wrote:

>> When you have a caching pool in writeback mode, updates to objects

>> (including deletes) are handled by writeback rather than writethrough.

>> Since there's no other activity against these pools, there is nothing

>> prompting the cache pool to flush updates out to the backing pool, so

>> the backing pool hasn't deleted its objects because nothing's told it

>> to. You'll find that the cache pool has deleted the data for its

>> objects, but it's keeping around a small "whiteout" and the object

>> info metadata.

>> The "rados ls" you're using has never played nicely with cache tiering

>> and probably never will. :( Listings are expensive operations and

>> modifying them to do more than the simple info scan would be fairly

>> expensive in terms of computation and IO.

>>

>> I think there are some caching commands you can send to flush updates

>> which would cause the objects to be entirely deleted, but I don't have

>> them off-hand. You can probably search the mailing list archives or

>> the docs for tiering commands. :)

>> -Greg

>>

>> On Tue, Nov 3, 2015 at 12:40 AM, Дмитрий Глушенок  wrote:

>>> Hi,

>>>

>>> While benchmarking tiered pool using rados bench it was noticed that objects are not being removed after test.

>>>

>>> Test was performed using "rados -p rbd bench 3600 write". The pool is not used by anything else.

>>>

>>> Just before end of test:

>>> POOLS:

>>>    NAME                      ID     USED       %USED     MAX AVAIL     OBJECTS

>>>    rbd-cache                 36     33110M      3.41          114G        8366

>>>    rbd                       37     43472M      4.47          237G       10858

>>>

>>> Some time later (few hundreds of writes are flushed, rados automatic cleanup finished):

>>> POOLS:

>>>    NAME                      ID     USED       %USED     MAX AVAIL     OBJECTS

>>>    rbd-cache                 36      22998         0          157G       16342

>>>    rbd                       37     46050M      4.74          234G       11503

>>>

>>> # rados -p rbd-cache ls | wc -l

>>> 16242

>>> # rados -p rbd ls | wc -l

>>> 11503

>>> #

>>>

>>> # rados -p rbd cleanup

>>> error during cleanup: -2

>>> error 2: (2) No such file or directory

>>> #

>>>

>>> # rados -p rbd cleanup --run-name "" --prefix prefix ""

>>> Warning: using slow linear search

>>> Removed 0 objects

>>> #

>>>

>>> # rados -p rbd ls | head -5

>>> benchmark_data_dropbox01.tzk_7641_object10901

>>> benchmark_data_dropbox01.tzk_7641_object9645

>>> benchmark_data_dropbox01.tzk_7641_object10389

>>> benchmark_data_dropbox01.tzk_7641_object10090

>>> benchmark_data_dropbox01.tzk_7641_object11204

>>> #

>>>

>>> #  rados -p rbd-cache ls | head -5

>>> benchmark_data_dropbox01.tzk_7641_object10901

>>> benchmark_data_dropbox01.tzk_7641_object9645

>>> benchmark_data_dropbox01.tzk_7641_object10389

>>> benchmark_data_dropbox01.tzk_7641_object5391

>>> benchmark_data_dropbox01.tzk_7641_object10090

>>> #

>>>

>>> So, it looks like the objects are still in place (in both pools?). But it is not possible to remove them:

>>>

>>> # rados -p rbd rm benchmark_data_dropbox01.tzk_7641_object10901

>>> error removing rbd>benchmark_data_dropbox01.tzk_7641_object10901: (2) No such file or directory

>>> #

>>>

>>> # ceph health

>>> HEALTH_OK

>>> #

>>>

>>>

>>> Can somebody explain the behavior? And is it possible to cleanup the benchmark data without recreating the pools?

>>>

>>>

>>> ceph version 0.94.5

>>>

>>> # ceph osd dump | grep rbd

>>> pool 36 'rbd-cache' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 100 pgp_num 100 last_change 755 flags hashpspool,incomplete_clones tier_of 37 cache_mode writeback target_bytes 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0

>>> pool 37 'rbd' erasure size 5 min_size 3 crush_ruleset 2 object_hash rjenkins pg_num 100 pgp_num 100 last_change 745 lfor 745 flags hashpspool tiers 36 read_tier 36 write_tier 36 stripe_width 4128

>>> #

>>>

>>> # ceph osd pool get rbd-cache hit_set_type

>>> hit_set_type: bloom

>>> # ceph osd pool get rbd-cache hit_set_period

>>> hit_set_period: 3600

>>> # ceph osd pool get rbd-cache hit_set_count

>>> hit_set_count: 1

>>> # ceph osd pool get rbd-cache target_max_objects

>>> target_max_objects: 0

>>> # ceph osd pool get rbd-cache target_max_bytes

>>> target_max_bytes: 107374182400

>>> # ceph osd pool get rbd-cache cache_target_dirty_ratio

>>> cache_target_dirty_ratio: 0.1

>>> # ceph osd pool get rbd-cache cache_target_full_ratio

>>> cache_target_full_ratio: 0.2

>>> #

>>>

>>> Crush map:

>>> root cache_tier {

>>>        id -7           # do not change unnecessarily

>>>        # weight 0.450

>>>        alg straw

>>>        hash 0  # rjenkins1

>>>        item osd.0 weight 0.090

>>>        item osd.1 weight 0.090

>>>        item osd.2 weight 0.090

>>>        item osd.3 weight 0.090

>>>        item osd.4 weight 0.090

>>> }

>>> root store_tier {

>>>        id -8           # do not change unnecessarily

>>>        # weight 0.450

>>>        alg straw

>>>        hash 0  # rjenkins1

>>>        item osd.5 weight 0.090

>>>        item osd.6 weight 0.090

>>>        item osd.7 weight 0.090

>>>        item osd.8 weight 0.090

>>>        item osd.9 weight 0.090

>>> }

>>> rule cache {

>>>        ruleset 1

>>>        type replicated

>>>        min_size 0

>>>        max_size 5

>>>        step take cache_tier

>>>        step chooseleaf firstn 0 type osd

>>>        step emit

>>> }

>>> rule store {

>>>        ruleset 2

>>>        type erasure

>>>        min_size 0

>>>        max_size 5

>>>        step take store_tier

>>>        step chooseleaf firstn 0 type osd

>>>        step emit

>>> }

>>>

>>> Thanks

>>>

>>> --

>>> Dmitry Glushenok

>>> Jet Infosystems

>>>

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

> -----BEGIN PGP SIGNATURE-----

> Version: Mailvelope v1.2.3

> Comment: https://www.mailvelope.com

>

> wsFcBAEBCAAQBQJWOOqoCRDmVDuy+mK58QAAScEP/jdBEK0lH2u38S7hOwll

> FA2J5+9lY0QAzxyTaRIzsZu0g+LSrLek39fFMTX/zHI0TMfSTM6dnPs6ucO2

> X59wbIyJQPEzzWEa2qn4F0j/QmWkMGfuMKDjxyT6OudpZtQKDS8mt13rnqAc

> QH+0bQaVfjbKGowCAHyJXTsPK4qgew7dV3JDW2hVX/vIDjYAJomkF8Ll4miT

> IQ6ViV2+9u4uW99ty4aSRnYUwaf7vqycK+qUT0Uohi6iTeym7s78O42Qa0p2

> WYHcXzAdYnBiR+qTIVWvKXn81tm4gmP4lSh0gpRoJ007c0hu5vTAnUvHRh0Z

> 070NTrmAAJXN0oZ7lkoksYZVkXDJkwBZpdif69OQU3No/HhcY9JtagEMCXcc

> 7/bUACaKjyKRzRmT3VNPQuMI0ix+tdi3PU4dL+16eBO832BNsqnyNHPxu570

> su1m4UQVGmoXCTUOeXYe9j4jzlHO/QRXcp/soFW5DgYO6JmylZzbyNmHPjMx

> CiOsdhnjAylG/zq42S4zTfd+F//aRODGJ0JNHmQYm7M2sezAvQD1HyBCAwds

> VfyOcfZwyeUNubtqssmOQ+n8/EDQciK4RH/hxG8bC8xsZaNgum7Z4/zA+Efc

> gMuplOsBECODAoBlfA2TP3/XixzTXoVGMmdXolOhs+Z+tT+O22eKUEK7GbMG

> rIWX

> =qbDR

> -----END PGP SIGNATURE-----

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com