Re: Cache tier unevictable objects

Christian Balzer <chibi@xxxxxxx> · Tue, 22 Aug 2017 19:28:28 +0900

On Tue, 22 Aug 2017 09:54:34 +0000 Eugen Block wrote:

> Hi list,
> 
> we have a productive Hammer cluster for our OpenStack cloud and  
> recently a colleague added a cache tier consisting of 2 SSDs and also  
> a pool size of 2, we're still experimenting with this topic.
> 
Risky, but I guess you know that.

> Now we have some hardware maintenance to do and need to shutdown  
> nodes, one at a time of course. So we tried to flush/evict the cache  
> pool and disable it to prevent data loss, we also set the cache-mode  
> to "forward". Most of the objects have been evicted successfully, but  
> there are still 39 objects left, and it's impossible to evict them.  
> I'm not sure how to make sure if we can just delete the cache pool  
> without data loss, we want to set up the cache-pool from scratch.
> 
Do I take it from this that your cache tier is only on one node?
If so upgrade the "Risky" up there to "Channeling Murphy".

If not, and your min_size is 1 as it should be for a size 2 pool, nothing
bad should happen.

Penultimately, google is EVIL but helps you find answers:
http://tracker.ceph.com/issues/12659

Christian

> # rados -p images-cache ls
> rbd_header.210f542ae8944a
> volume-ce17068e-a36d-4d9b-9779-3af473aba033.rbd
> rbd_header.50ec372eb141f2
> 931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.rbd
> rbd_header.59dd32ae8944a
> ...
> 
> There are only 3 types of objects in the cache-pool:
>    - rbd_header
>    - volume-XXX.rbd (obviously cinder related)
>    - XXX_disk (nova disks)
> 
> All rbd_header objects have a size of 0 if I run a "stat" command on  
> them, the rest has a size of 112. If I compare the objects with the  
> respective object in the cold-storage, they are identical:
> 
> Object rbd_header.1128db1b5d2111:
> images-cache/rbd_header.1128db1b5d2111 mtime 2017-08-21  
> 15:55:26.000000, size 0
>        images/rbd_header.1128db1b5d2111 mtime 2017-08-21  
> 15:55:26.000000, size 0
> 
> Object volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd:
> images-cache/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime  
> 2017-08-21 15:55:26.000000, size 112
>        images/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime  
> 2017-08-21 15:55:26.000000, size 112
> 
> Object 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd:
> images-cache/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime  
> 2017-08-21 15:55:25.000000, size 112
>        images/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime  
> 2017-08-21 15:55:25.000000, size 112
> 
> Some of them have an rbd_lock, some of them have a watcher, some don't  
> have any of that but they still can't be evicted:
> 
> # rados -p images-cache lock list rbd_header.2207c92ae8944a
> {"objname":"rbd_header.2207c92ae8944a","locks":[]}
> # rados -p images-cache listwatchers rbd_header.2207c92ae8944a
> #
> # rados -p images-cache cache-evict rbd_header.2207c92ae8944a
> error from cache-evict rbd_header.2207c92ae8944a: (16) Device or resource busy
> 
> Then I also tried to shutdown an instance that uses some of the  
> volumes listed in the cache pool, but the objects didn't change at  
> all, the total number was also still 39. For the rbd_header objects I  
> don't even know how to identify their "owner", is there a way?
> 
> Has anyone a hint what else I could check or is it reasonable to  
> assume that the objects are really the same and there would be no data  
> loss in case we deleted that pool?
> We appreciate any help!
> 
> Regards,
> Eugen
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com