On Tuesday, November 3, 2015, Дмитрий Глушенок <glush@xxxxxxxxxx> wrote:
Hi,
Thanks Gregory and Robert, now it is a bit clearer.
After cache-flush-evict-all almost all objects were deleted, but 101 remained in cache pool. Also 1 pg changed its state to inconsistent with HEALTH_ERR.
"ceph pg repair" changed objects count to 100, but at least ceph become healthy.
Now it looks like:
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd-cache 36 23185 0 157G 100
rbd 37 0 0 279G 0
# rados -p rbd-cache ls -all
# rados -p rbd ls -all
#
Is there any way to find what the objects are?
"ceph pg ls-by-pool rbd-cache" gives me pgs of the objects. Looking into these pgs gives me nothing I can understand :)
# ceph pg ls-by-pool rbd-cache | head -4
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
36.0 1 0 0 0 0 83 926 926 active+clean 2015-11-03 22:06:39.193371 798'926 798:640 [4,0,3] 4 [4,0,3] 4 798'926 2015-11-03 22:06:39.193321 798'926 2015-11-03 22:06:39.193321
36.1 1 0 0 0 0 193 854 854 active+clean 2015-11-03 18:28:51.190819 798'854 798:515 [1,4,3] 1 [1,4,3] 1 796'628 2015-11-03 18:28:51.190749 0'0 2015-11-02 18:28:42.546224
36.2 1 0 0 0 0 198 869 869 active+clean 2015-11-03 18:28:44.556048 798'869 798:554 [2,0,1] 2 [2,0,1] 2 796'650 2015-11-03 18:28:44.555980 0'0 2015-11-02 18:28:42.546226
#
# find /var/lib/ceph/osd/ceph-0/current/36.0_head/
/var/lib/ceph/osd/ceph-0/current/36.0_head/
/var/lib/ceph/osd/ceph-0/current/36.0_head/__head_00000000__24
/var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 11:12:37.962360\u2015-11-03 21:28:58.149662__head_00000000_.ceph-internal_24
# find /var/lib/ceph/osd/ceph-0/current/36.2_head/
/var/lib/ceph/osd/ceph-0/current/36.2_head/
/var/lib/ceph/osd/ceph-0/current/36.2_head/__head_00000002__24
/var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 19:50:00.788736\u2015-11-03 21:29:02.460568__head_00000002_.ceph-internal_24
#
# ls -l /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\\uset\\u36.0\\uarchive\\u2015-11-03\ 11\:12\:37.962360\\u2015-11-03\ 21\:28\:58.149662__head_00000000_.ceph-internal_24
-rw-r--r--. 1 root root 83 Nov 3 21:28 /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 11:12:37.962360\u2015-11-03 21:28:58.149662__head_00000000_.ceph-internal_24
#
# ls -l /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\\uset\\u36.2\\uarchive\\u2015-11-02\ 19\:50\:00.788736\\u2015-11-03\ 21\:29\:02.460568__head_00000002_.ceph-internal_24
-rw-r--r--. 1 root root 198 Nov 3 21:29 /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 19:50:00.788736\u2015-11-03 21:29:02.460568__head_00000002_.ceph-internal_24
#
--
Dmitry Glushenok
Jet Infosystems
> 3 нояб. 2015 г., в 20:11, Robert LeBlanc <robert@xxxxxxxxxxxxx> написал(а):
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Try:
>
> rados -p {cachepool} cache-flush-evict-all
>
> and see if the objects clean up.
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Nov 3, 2015 at 8:02 AM, Gregory Farnum wrote:
>> When you have a caching pool in writeback mode, updates to objects
>> (including deletes) are handled by writeback rather than writethrough.
>> Since there's no other activity against these pools, there is nothing
>> prompting the cache pool to flush updates out to the backing pool, so
>> the backing pool hasn't deleted its objects because nothing's told it
>> to. You'll find that the cache pool has deleted the data for its
>> objects, but it's keeping around a small "whiteout" and the object
>> info metadata.
>> The "rados ls" you're using has never played nicely with cache tiering
>> and probably never will. :( Listings are expensive operations and
>> modifying them to do more than the simple info scan would be fairly
>> expensive in terms of computation and IO.
>>
>> I think there are some caching commands you can send to flush updates
>> which would cause the objects to be entirely deleted, but I don't have
>> them off-hand. You can probably search the mailing list archives or
>> the docs for tiering commands. :)
>> -Greg
>>
>> On Tue, Nov 3, 2015 at 12:40 AM, Дмитрий Глушенок wrote:
>>> Hi,
>>>
>>> While benchmarking tiered pool using rados bench it was noticed that objects are not being removed after test.
>>>
>>> Test was performed using "rados -p rbd bench 3600 write". The pool is not used by anything else.
>>>
>>> Just before end of test:
>>> POOLS:
>>> NAME ID USED %USED MAX AVAIL OBJECTS
>>> rbd-cache 36 33110M 3.41 114G 8366
>>> rbd 37 43472M 4.47 237G 10858
>>>
>>> Some time later (few hundreds of writes are flushed, rados automatic cleanup finished):
>>> POOLS:
>>> NAME ID USED %USED MAX AVAIL OBJECTS
>>> rbd-cache 36 22998 0 157G 16342
>>> rbd 37 46050M 4.74 234G 11503
>>>
>>> # rados -p rbd-cache ls | wc -l
>>> 16242
>>> # rados -p rbd ls | wc -l
>>> 11503
>>> #
>>>
>>> # rados -p rbd cleanup
>>> error during cleanup: -2
>>> error 2: (2) No such file or directory
>>> #
>>>
>>> # rados -p rbd cleanup --run-name "" --prefix prefix ""
>>> Warning: using slow linear search
>>> Removed 0 objects
>>> #
>>>
>>> # rados -p rbd ls | head -5
>>> benchmark_data_dropbox01.tzk_7641_object10901
>>> benchmark_data_dropbox01.tzk_7641_object9645
>>> benchmark_data_dropbox01.tzk_7641_object10389
>>> benchmark_data_dropbox01.tzk_7641_object10090
>>> benchmark_data_dropbox01.tzk_7641_object11204
>>> #
>>>
>>> # rados -p rbd-cache ls | head -5
>>> benchmark_data_dropbox01.tzk_7641_object10901
>>> benchmark_data_dropbox01.tzk_7641_object9645
>>> benchmark_data_dropbox01.tzk_7641_object10389
>>> benchmark_data_dropbox01.tzk_7641_object5391
>>> benchmark_data_dropbox01.tzk_7641_object10090
>>> #
>>>
>>> So, it looks like the objects are still in place (in both pools?). But it is not possible to remove them:
>>>
>>> # rados -p rbd rm benchmark_data_dropbox01.tzk_7641_object10901
>>> error removing rbd>benchmark_data_dropbox01.tzk_7641_object10901: (2) No such file or directory
>>> #
>>>
>>> # ceph health
>>> HEALTH_OK
>>> #
>>>
>>>
>>> Can somebody explain the behavior? And is it possible to cleanup the benchmark data without recreating the pools?
>>>
>>>
>>> ceph version 0.94.5
>>>
>>> # ceph osd dump | grep rbd
>>> pool 36 'rbd-cache' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 100 pgp_num 100 last_change 755 flags hashpspool,incomplete_clones tier_of 37 cache_mode writeback target_bytes 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0
>>> pool 37 'rbd' erasure size 5 min_size 3 crush_ruleset 2 object_hash rjenkins pg_num 100 pgp_num 100 last_change 745 lfor 745 flags hashpspool tiers 36 read_tier 36 write_tier 36 stripe_width 4128
>>> #
>>>
>>> # ceph osd pool get rbd-cache hit_set_type
>>> hit_set_type: bloom
>>> # ceph osd pool get rbd-cache hit_set_period
>>> hit_set_period: 3600
>>> # ceph osd pool get rbd-cache hit_set_count
>>> hit_set_count: 1
>>> # ceph osd pool get rbd-cache target_max_objects
>>> target_max_objects: 0
>>> # ceph osd pool get rbd-cache target_max_bytes
>>> target_max_bytes: 107374182400
>>> # ceph osd pool get rbd-cache cache_target_dirty_ratio
>>> cache_target_dirty_ratio: 0.1
>>> # ceph osd pool get rbd-cache cache_target_full_ratio
>>> cache_target_full_ratio: 0.2
>>> #
>>>
>>> Crush map:
>>> root cache_tier {
>>> id -7 # do not change unnecessarily
>>> # weight 0.450
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.0 weight 0.090
>>> item osd.1 weight 0.090
>>> item osd.2 weight 0.090
>>> item osd.3 weight 0.090
>>> item osd.4 weight 0.090
>>> }
>>> root store_tier {
>>> id -8 # do not change unnecessarily
>>> # weight 0.450
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.5 weight 0.090
>>> item osd.6 weight 0.090
>>> item osd.7 weight 0.090
>>> item osd.8 weight 0.090
>>> item osd.9 weight 0.090
>>> }
>>> rule cache {
>>> ruleset 1
>>> type replicated
>>> min_size 0
>>> max_size 5
>>> step take cache_tier
>>> step chooseleaf firstn 0 type osd
>>> step emit
>>> }
>>> rule store {
>>> ruleset 2
>>> type erasure
>>> min_size 0
>>> max_size 5
>>> step take store_tier
>>> step chooseleaf firstn 0 type osd
>>> step emit
>>> }
>>>
>>> Thanks
>>>
>>> --
>>> Dmitry Glushenok
>>> Jet Infosystems
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWOOqoCRDmVDuy+mK58QAAScEP/jdBEK0lH2u38S7hOwll
> FA2J5+9lY0QAzxyTaRIzsZu0g+LSrLek39fFMTX/zHI0TMfSTM6dnPs6ucO2
> X59wbIyJQPEzzWEa2qn4F0j/QmWkMGfuMKDjxyT6OudpZtQKDS8mt13rnqAc
> QH+0bQaVfjbKGowCAHyJXTsPK4qgew7dV3JDW2hVX/vIDjYAJomkF8Ll4miT
> IQ6ViV2+9u4uW99ty4aSRnYUwaf7vqycK+qUT0Uohi6iTeym7s78O42Qa0p2
> WYHcXzAdYnBiR+qTIVWvKXn81tm4gmP4lSh0gpRoJ007c0hu5vTAnUvHRh0Z
> 070NTrmAAJXN0oZ7lkoksYZVkXDJkwBZpdif69OQU3No/HhcY9JtagEMCXcc
> 7/bUACaKjyKRzRmT3VNPQuMI0ix+tdi3PU4dL+16eBO832BNsqnyNHPxu570
> su1m4UQVGmoXCTUOeXYe9j4jzlHO/QRXcp/soFW5DgYO6JmylZzbyNmHPjMx
> CiOsdhnjAylG/zq42S4zTfd+F//aRODGJ0JNHmQYm7M2sezAvQD1HyBCAwds
> VfyOcfZwyeUNubtqssmOQ+n8/EDQciK4RH/hxG8bC8xsZaNgum7Z4/zA+Efc
> gMuplOsBECODAoBlfA2TP3/XixzTXoVGMmdXolOhs+Z+tT+O22eKUEK7GbMG
> rIWX
> =qbDR
> -----END PGP SIGNATURE-----
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com