On Tue, 22 Aug 2017 09:54:34 +0000 Eugen Block wrote: > Hi list, > > we have a productive Hammer cluster for our OpenStack cloud and > recently a colleague added a cache tier consisting of 2 SSDs and also > a pool size of 2, we're still experimenting with this topic. > Risky, but I guess you know that. > Now we have some hardware maintenance to do and need to shutdown > nodes, one at a time of course. So we tried to flush/evict the cache > pool and disable it to prevent data loss, we also set the cache-mode > to "forward". Most of the objects have been evicted successfully, but > there are still 39 objects left, and it's impossible to evict them. > I'm not sure how to make sure if we can just delete the cache pool > without data loss, we want to set up the cache-pool from scratch. > Do I take it from this that your cache tier is only on one node? If so upgrade the "Risky" up there to "Channeling Murphy". If not, and your min_size is 1 as it should be for a size 2 pool, nothing bad should happen. Penultimately, google is EVIL but helps you find answers: http://tracker.ceph.com/issues/12659 Christian > # rados -p images-cache ls > rbd_header.210f542ae8944a > volume-ce17068e-a36d-4d9b-9779-3af473aba033.rbd > rbd_header.50ec372eb141f2 > 931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.rbd > rbd_header.59dd32ae8944a > ... > > There are only 3 types of objects in the cache-pool: > - rbd_header > - volume-XXX.rbd (obviously cinder related) > - XXX_disk (nova disks) > > All rbd_header objects have a size of 0 if I run a "stat" command on > them, the rest has a size of 112. If I compare the objects with the > respective object in the cold-storage, they are identical: > > Object rbd_header.1128db1b5d2111: > images-cache/rbd_header.1128db1b5d2111 mtime 2017-08-21 > 15:55:26.000000, size 0 > images/rbd_header.1128db1b5d2111 mtime 2017-08-21 > 15:55:26.000000, size 0 > > Object volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd: > images-cache/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime > 2017-08-21 15:55:26.000000, size 112 > images/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime > 2017-08-21 15:55:26.000000, size 112 > > Object 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd: > images-cache/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime > 2017-08-21 15:55:25.000000, size 112 > images/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime > 2017-08-21 15:55:25.000000, size 112 > > Some of them have an rbd_lock, some of them have a watcher, some don't > have any of that but they still can't be evicted: > > # rados -p images-cache lock list rbd_header.2207c92ae8944a > {"objname":"rbd_header.2207c92ae8944a","locks":[]} > # rados -p images-cache listwatchers rbd_header.2207c92ae8944a > # > # rados -p images-cache cache-evict rbd_header.2207c92ae8944a > error from cache-evict rbd_header.2207c92ae8944a: (16) Device or resource busy > > Then I also tried to shutdown an instance that uses some of the > volumes listed in the cache pool, but the objects didn't change at > all, the total number was also still 39. For the rbd_header objects I > don't even know how to identify their "owner", is there a way? > > Has anyone a hint what else I could check or is it reasonable to > assume that the objects are really the same and there would be no data > loss in case we deleted that pool? > We appreciate any help! > > Regards, > Eugen > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com