Re: Ceph Pacific (16.2.6) - Orphaned cache tier objects?

Eugen Block <eblock@xxxxxx> · Fri, 22 Oct 2021 07:42:27 +0000

Hi,

just to clarify, do the rbd images exist in the backing pool "rbd_hdd"  
and are they in use?

[admin@kvm1a ~]# for pool in rbd_hdd rbd_hdd_cache; do rados -p  
$pool stat2 rbd_data.4118694b8f91d5.00000000000188f0; done

rbd_hdd/rbd_data.4118694b8f91d5.00000000000188f0 mtime  
2021-10-19T15:48:53.243749+0200, size 4194304
rbd_hdd_cache/rbd_data.4118694b8f91d5.00000000000188f0 mtime  
2021-10-19T15:48:53.243749+0200, size 4194304

This indicates that they are in fact existing (at least in the backing  
pool), but can you check? Have you tried to disable the cache-tier [1]  
by setting the mode to "proxy" and then try evicting again? I remember  
that we had a similar issue a few years ago, not sure though. IIRC we  
had to delete some objects which were not critical in order to get the  
cache to a working state again. But we haven't had an issue since.

[1]  
https://docs.ceph.com/en/latest/rados/operations/cache-tiering/?highlight=cache%20tier#removing-a-writeback-cache

Zitat von David Herselman <dhe@xxxxxxxx>:

Hi Everyone,

We appear to have a problem with ghost objects, most probably from  
when we were running Nautilus or even earlier. We have a few Ceph  
clusters in production (11) where a few (4) run with SSD cache tiers  
for HDD RBD pools.

My Google-Fo appears to be failing me, none of the discussions  
around others having this symptom appear to provide an explanation  
for our cache tier containing thousands of objects which we can't  
flush.

Some dev/test clusters work perfectly though. Herewith an example of  
one where I can run 'rados -p rbd_hdd_cache  
cache-try-flush-evict-all' and it only leaves the locked rbd_header  
object in place, which I expect:
[admin@test1 ~]# echo -e "\nFlushing objects:"; rados -p  
rbd_hdd_cache cache-try-flush-evict-all; echo -e "\nObjects in  
pool:"; rados -p rbd_hdd_cache ls; echo -e "\nPool storage  
utilisation:"; ceph df | grep -e PGS -e rbd_hdd_cache;

Flushing objects:
        rbd_header.e10d342490c849
failed to evict /rbd_header.e10d342490c849: (16) Device or resource busy
cache-try-flush-evict-all finished with errors

Objects in pool:
rbd_header.e10d342490c849

Pool storage utilisation:
POOL                       ID  PGS   STORED  OBJECTS     USED  %USED  
 MAX AVAIL
rbd_hdd_cache              14    8  206 KiB       25  3.4 MiB      0  
    78 GiB

We are however having problems with other clusters where items never  
remain cached and are immediately flushed to disk. The problem  
appears to be that there are a host of orphaned or ghost objects  
referenced in the cache tier, resulting in the cache pool constantly  
being near its limit and subsequently then flushing or evicting any  
object which does get elevated:

The base tier is called 'rbd_hdd' with the cache tier being called  
'rbd_hdd_cache', the max_target_bytes is set as 128 GiB:

[admin@kvm1a ~]# ceph osd dump | grep rbd_hdd

pool 1 'rbd_hdd' replicated size 3 min_size 2 crush_rule 0  
object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on  
last_change 122453 lfor 75/75/42264 flags  
hashpspool,selfmanaged_snaps tiers 3 read_tier 3 write_tier 3  
stripe_width 0 application rbd

pool 3 'rbd_hdd_cache' replicated size 3 min_size 2 crush_rule 1  
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on  
last_change 122708 lfor 75/75/75 flags  
hashpspool,incomplete_clones,selfmanaged_snaps tier_of 1 cache_mode  
writeback target_bytes 137438953472 hit_set  
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0}  
3600s x3 decay_rate 0 search_last_n 0 min_read_recency_for_promote 2  
min_write_recency_for_promote 1 stripe_width 0 application rbd

The cache tier utilisation is higher than cache_target_full_ratio,  
which I presume is the reason why elevated objects are flushed out  
again:

[admin@kvm1a ~]# ceph df | grep -e PGS -e rbd_hdd_cache;

POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
rbd_hdd_cache           3   64  107 GiB   29.91k  299 GiB   3.55    2.6 TiB

The cache tier was created with the following commands:

ceph osd pool create rbd_hdd_cache 64 64 replicated replicated_ssd;
ceph osd tier add rbd_hdd rbd_hdd_cache;
ceph osd tier cache-mode rbd_hdd_cache writeback;
ceph osd tier set-overlay rbd_hdd rbd_hdd_cache;
ceph osd pool set rbd_hdd_cache hit_set_type bloom;
ceph osd pool set rbd_hdd_cache hit_set_count 3
ceph osd pool set rbd_hdd_cache hit_set_period 3600
ceph osd pool set rbd_hdd_cache target_max_bytes $[128*1024*1024*1024]
ceph osd pool set rbd_hdd_cache min_read_recency_for_promote 2
ceph osd pool set rbd_hdd_cache min_write_recency_for_promote 1
ceph osd pool set rbd_hdd_cache cache_target_dirty_ratio 0.4
ceph osd pool set rbd_hdd_cache cache_target_dirty_high_ratio 0.6
ceph osd pool set rbd_hdd_cache cache_target_full_ratio 0.8

I have thousands of objects in the cache tier which don't appear to  
exist. Cluster is configured to deep scrub all placement groups  
within a week and I have no errors:

[admin@kvm1a ~]# echo "Placement groups last deep scrub date  
stamp:"; ceph pg dump 2> /dev/null | grep active | awk '{print $23}'  
| cut -dT -f1 | sort | uniq -c;

Placement groups last deep scrub date stamp:
      3 2021-10-13
     39 2021-10-14
     68 2021-10-15
    143 2021-10-16
    159 2021-10-17
    114 2021-10-18
     19 2021-10-19

[admin@kvm1a ~]# ceph -s

  cluster:
    id:     31f6ea46-12cb-47e8-a6f3-60fb6bbd1782
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum kvm1a,kvm1b,kvm1c (age 4h)
    mgr: kvm1a(active, since 4h), standbys: kvm1b, kvm1c
    mds: 1/1 daemons up, 2 standby
    osd: 35 osds: 35 up (since 4h), 35 in (since 6d)

  data:
    volumes: 1/1 healthy
    pools:   8 pools, 545 pgs
    objects: 2.53M objects, 8.8 TiB
    usage:   25 TiB used, 85 TiB / 110 TiB avail
    pgs:     545 active+clean

  io:
    client:   1.8 MiB/s rd, 6.9 MiB/s wr, 115 op/s rd, 185 op/s wr
    cache:    3.7 MiB/s flush, 3 op/s promote

[admin@kvm1a ~]# ceph versions

{
    "mon": {
        "ceph version 16.2.6  
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.6  
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)": 3
    },
    "osd": {
        "ceph version 16.2.6  
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)": 35
    },
    "mds": {
        "ceph version 16.2.6  
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)": 3
    },
    "overall": {
        "ceph version 16.2.6  
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable)": 44
    }
}

Snippet from the many thousands of errors I receive when attempting  
to flush and evict objects from the cache tier:
[admin@kvm1a ~]# rados -p rbd_hdd_cache cache-try-flush-evict-all;
<snip>
        rbd_data.4118694b8f91d5.00000000000188f0
failed to flush /rbd_data.4118694b8f91d5.00000000000188f0: (2) No  
such file or directory
        rbd_data.746f3c94fb3a42.0000000000026832
failed to flush /rbd_data.746f3c94fb3a42.0000000000026832: (2) No  
such file or directory
        rbd_data.a820d2b5978445.0000000000000879
failed to flush /rbd_data.a820d2b5978445.0000000000000879: (2) No  
such file or directory
        rbd_data.0b751cffc1ec99.0000000000000814
failed to flush /rbd_data.0b751cffc1ec99.0000000000000814: (2) No  
such file or directory
        rbd_data.746f3c94fb3a42.0000000000033a8f
failed to flush /rbd_data.746f3c94fb3a42.0000000000033a8f: (2) No  
such file or directory
cache-try-flush-evict-all finished with errors

Querying stats for any of these then returns the date stamp of when  
we tried to evict/flush those objects:

[admin@kvm1a ~]# for pool in rbd_hdd rbd_hdd_cache; do rados -p  
$pool stat2 rbd_data.4118694b8f91d5.00000000000188f0; done

rbd_hdd/rbd_data.4118694b8f91d5.00000000000188f0 mtime  
2021-10-19T15:48:53.243749+0200, size 4194304
rbd_hdd_cache/rbd_data.4118694b8f91d5.00000000000188f0 mtime  
2021-10-19T15:48:53.243749+0200, size 4194304

Any ideas?

Regards
David Herselman
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx