Re: Weird behaviour of mon_osd_down_out_subtree_limit=host

Jan Schermer <jan@xxxxxxxxxxx> · Fri, 24 Jul 2015 14:49:24 +0200

Turns out that when we started the 3 OSDs it did “out” the rest on the same host, so their reweight was 0.
Thus when I started the singular OSD on that host, it tried to put all the PGs on the other OSDs onto this one (which failed for lack of disk space) and because of that it also consumed much more memory.
I had to reweight all the OSDs back (since we don’t usually run them with 1 because of poor balancing) and I am starting them one by one…

I think mon_osd_down_out_subtree_limit shouyld be a bit smarter and only “out” the OSDs if they were started at least once - not when one other OSD on the host starts. I don’t think I want to start all the OSDs at once, ever, so that pretty much makes it unusable.

Jan

> On 24 Jul 2015, at 13:53, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> 
> “Friday fun”… not!
> 
> We set mon_osd_down_out_subtree_limit=host some time ago. Now we needed to take down all OSDs on one host and as expected nothing happened (noout was _not_ set). All the PGs showed as stuck degraded.
> 
> Then we took 3 OSDs on the host up and then down again because of slow request madness.
> 
> Since then there’s some weirdness I don’t have an explanation for
> 
> 1) there are 8 active+remapped PGs (hosted on completely different hosts from the one we were working on). Why?
> 
> 2) How does mon_osd_down_out_subtree_limit even work? How does it tell the whole host is down? If I start just one OSD, is the host still down? Will it “out” all the other OSDs?
> Doesn’t look like it, because I just started one OSD and it didn’t out all the others.
> 
> 3) after starting the one OSD, there are some backfills occuring, even though I set “nobackfill”
> 
> 4) the one OSD I started on this host now consumes 6.5GB memory (RSS). All other OSDs in the cluster consume ~1.2-1.5 GB. No idea why…
> (and it’s the vanilla tcmalloc version)
> 
> Doh…
> 
> Any ideas welcome. I can’t even start all the OSDs if they start consuming this amount of memory.
> 
> 
> Jan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com