Hi, thanks for your quick response!
Do I take it from this that your cache tier is only on one node?
If so upgrade the "Risky" up there to "Channeling Murphy".
The two SSDs are on two different nodes, but since we just started
using cache tier, we decided to use a pool size of 2, we know it's not
recommended.
If not, and your min_size is 1 as it should be for a size 2 pool, nothing
bad should happen.
The size and min_size is 2, I changed min_size to 1.
Penultimately, google is EVIL but helps you find answers:
http://tracker.ceph.com/issues/12659
I had already seen this, it describes what we are seeing in our
cluster. Even though the cache_mode is set to "forward" I still see
new objects written to it if I spawn a new instance. At least this
lead to a better understanding, the rbd_header files seem to represent
the clones of a snapshot if a new instance is spawned, that's helpful.
We plan to upgrade our cluster soon, but first we need to get rid of
this cache pool. We'll continue to analyze the cache-pool, but if you
have any more helpful insights, we would appreciate it!
Regards,
Eugen
Zitat von Christian Balzer <chibi@xxxxxxx>:
On Tue, 22 Aug 2017 09:54:34 +0000 Eugen Block wrote:
Hi list,
we have a productive Hammer cluster for our OpenStack cloud and
recently a colleague added a cache tier consisting of 2 SSDs and also
a pool size of 2, we're still experimenting with this topic.
Risky, but I guess you know that.
Now we have some hardware maintenance to do and need to shutdown
nodes, one at a time of course. So we tried to flush/evict the cache
pool and disable it to prevent data loss, we also set the cache-mode
to "forward". Most of the objects have been evicted successfully, but
there are still 39 objects left, and it's impossible to evict them.
I'm not sure how to make sure if we can just delete the cache pool
without data loss, we want to set up the cache-pool from scratch.
Do I take it from this that your cache tier is only on one node?
If so upgrade the "Risky" up there to "Channeling Murphy".
If not, and your min_size is 1 as it should be for a size 2 pool, nothing
bad should happen.
Penultimately, google is EVIL but helps you find answers:
http://tracker.ceph.com/issues/12659
Christian
# rados -p images-cache ls
rbd_header.210f542ae8944a
volume-ce17068e-a36d-4d9b-9779-3af473aba033.rbd
rbd_header.50ec372eb141f2
931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.rbd
rbd_header.59dd32ae8944a
...
There are only 3 types of objects in the cache-pool:
- rbd_header
- volume-XXX.rbd (obviously cinder related)
- XXX_disk (nova disks)
All rbd_header objects have a size of 0 if I run a "stat" command on
them, the rest has a size of 112. If I compare the objects with the
respective object in the cold-storage, they are identical:
Object rbd_header.1128db1b5d2111:
images-cache/rbd_header.1128db1b5d2111 mtime 2017-08-21
15:55:26.000000, size 0
images/rbd_header.1128db1b5d2111 mtime 2017-08-21
15:55:26.000000, size 0
Object volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd:
images-cache/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime
2017-08-21 15:55:26.000000, size 112
images/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime
2017-08-21 15:55:26.000000, size 112
Object 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd:
images-cache/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime
2017-08-21 15:55:25.000000, size 112
images/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime
2017-08-21 15:55:25.000000, size 112
Some of them have an rbd_lock, some of them have a watcher, some don't
have any of that but they still can't be evicted:
# rados -p images-cache lock list rbd_header.2207c92ae8944a
{"objname":"rbd_header.2207c92ae8944a","locks":[]}
# rados -p images-cache listwatchers rbd_header.2207c92ae8944a
#
# rados -p images-cache cache-evict rbd_header.2207c92ae8944a
error from cache-evict rbd_header.2207c92ae8944a: (16) Device or
resource busy
Then I also tried to shutdown an instance that uses some of the
volumes listed in the cache pool, but the objects didn't change at
all, the total number was also still 39. For the rbd_header objects I
don't even know how to identify their "owner", is there a way?
Has anyone a hint what else I could check or is it reasonable to
assume that the objects are really the same and there would be no data
loss in case we deleted that pool?
We appreciate any help!
Regards,
Eugen
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Rakuten Communications
--
Eugen Block voice : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail : eblock@xxxxxx
Vorsitzende des Aufsichtsrates: Angelika Mozdzen
Sitz und Registergericht: Hamburg, HRB 90934
Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com