TLDR; In Jewel, I briefly had 2 cache tiers assigned to an ec pool and I think that broke my ec pool. I then made a series of decisions attempting to repair that mistake. I now think I've caused further issues.
Background:
My first potentially poor decision was not removing the original cache tier before adding the new one.
Basically, the workflow was as follows:
pools:
data_ec
data_cache
data_cache
data_new_cache
ceph osd tier add data_ec data_new_cache
ceph osd tier cache-mode data_new_cache writeback
ceph osd tier set-overlay data_ec data_new_cache
ceph osd pool set data_new_cache hit_set_type bloom
ceph osd pool set data_new_cache hit_set_count 1
ceph osd pool set data_new_cache hit_set_period 3600
ceph osd pool set data_new_cache target_max_bytes 1000000000000
ceph osd pool set data_new_cache min_read_recency_for_promote 1
ceph osd pool set data_new_cache min_write_recency_for_promote 1
#so now I decided to attempt to remove the old cache
ceph osd tier cache-mode data_cache forward
#here is where things got bad
rados -p data_cache cache-flush-evict-all
#every object rados attempted to flush from the cache, left errors of the following varieties
#
rados -p data_cache cache-flush-evict-all
rbd_data.af81e6238e1f29.000000000001732e
error listing snap shots /rbd_data.af81e6238e1f29.000000000001732e: (2) No such file or directory
rbd_data.af81e6238e1f29.00000000000143bb
error listing snap shots /rbd_data.af81e6238e1f29.00000000000143bb: (2) No such file or directory
rbd_data.af81e6238e1f29.00000000000cf89d
failed to flush /rbd_data.af81e6238e1f29.00000000000cf89d: (2) No such file or directory
rbd_data.af81e6238e1f29.00000000000cf82c
#Following these errors, I thought maybe the world would become happy again if I just removed the newly added ecpool.
ceph osd tier cache-mode data_new_cache forward
rados -p data_new_cache cache-flush-evict-all
#when running the evict against the new tier, I received no errors
#and so begins potential mistake number 3
ceph osd tier remove-overlay ec_data
ceph osd tier remove data_ec data_new_cache
#I received the same errors. while trying to evict
#knowing my data had been untouched for over an hour, I made a terrible decison
ceph osd tier remove data_ec data_cache
#I then discovered that I couldn't add the new or the old cache back to the ec pool, even with --force-nonempty
ceph osd tier add data_ec data_cache --force-nonempty
Error ENOTEMPTY: tier pool 'data_cache' has snapshot state; it cannot be added as a tier without breaking the pool
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com