Weird behaviour of mon_osd_down_out_subtree_limit=host

Jan Schermer <jan@xxxxxxxxxxx> · Fri, 24 Jul 2015 13:53:30 +0200

“Friday fun”… not!

We set mon_osd_down_out_subtree_limit=host some time ago. Now we needed to take down all OSDs on one host and as expected nothing happened (noout was _not_ set). All the PGs showed as stuck degraded.

Then we took 3 OSDs on the host up and then down again because of slow request madness.

Since then there’s some weirdness I don’t have an explanation for

1) there are 8 active+remapped PGs (hosted on completely different hosts from the one we were working on). Why?

2) How does mon_osd_down_out_subtree_limit even work? How does it tell the whole host is down? If I start just one OSD, is the host still down? Will it “out” all the other OSDs?
Doesn’t look like it, because I just started one OSD and it didn’t out all the others.

3) after starting the one OSD, there are some backfills occuring, even though I set “nobackfill”

4) the one OSD I started on this host now consumes 6.5GB memory (RSS). All other OSDs in the cluster consume ~1.2-1.5 GB. No idea why…
(and it’s the vanilla tcmalloc version)

Doh…

Any ideas welcome. I can’t even start all the OSDs if they start consuming this amount of memory.

Jan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com