Removing cache tier for RBD pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi *,

trying to remove a caching tier from a pool used for RBD / Openstack, we followed the procedure from http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache and ran into problems.

The cluster is currently running Ceph 12.2.2, the caching tier was created with an earlier release of Ceph.

First of all, setting the cache-mode to "forward" is reported to be unsafe, which is not mentioned in the documentation - if it's really meant to be used in this case, the need for "--yes-i-really-mean-it" should be documented.

Unfortunately, using "rados -p hot-storage cache-flush-evict-all" not only reported errors ("file not found") for many objects, but left us with quite a number of objects in the pool and new ones being created, despite the "forward" mode. Even after stopping all Openstack instances ("VMs"), we could also see that the remaining objects in the pool were still locked. Manually unlocking these via rados commands worked, but "cache-flush-evict-all" then still reported those "file not found" errors and 1070 objects remained in the pool, like before. We checked the remaining objects via "rados stat" both in the hot-storage and the cold-storage pool and could see that every hot-storage object had a counter-part in cold-storage with identical stat info. We also compared some of the objects (with size > 0) and found the hot-storage and cold-storage entities to be identical.

We aborted that attempt, reverted the mode to "writeback" and restarted the Openstack cluster - everything was working fine again, of course still using the cache tier.

During a recent maintenance window, the Openstack cluster was shut down again and we re-tried the procedure. As there were no active users of the images pool, we skipped the step of forcing the cache mode to forward and immediately issued the "cache-flush-evict-all" command. Again 1070 objects remained in the hot-storage pool (and gave "file not found" errors), but unlike last time, none were locked.

Out of curiosity we then issued loops of "rados -p hot-storage cache-flush <obj-name>" and "rados -p hot-storage cache-evict <obj-name>" for all objects in the hot-storage pool and surprisingly not only received no error messages at all, but were left with an empty hot-storage pool! We then proceeded with the further steps from the docs and were able to successfully remove the cache tier.

This leaves us with two questions:

1. Does setting the cache mode to "forward" lead to above situation of remaining locks on hot-storage pool objects? Maybe the clients' unlock requests are forwarded to the cold-storage pool, leaving the hot-storage objects locked? If so, this should be documented and it'd seem impossible to cleanly remove a cache tier during live operations.

2. What is the significant difference between "rados cache-flush-evict-all" and separate "cache-flush" and "cache-evict" cycles? Or is it some implementation error that leads to those "file not found" errors with "cache-flush-evict-all", while the manual cycles work successfully?

Thank you for any insight you might be able to share.

Regards,
Jens

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux