Osd load and timeout when recovering

Yann ROBIN <yann.robin@xxxxxxxxxxxxx> · Mon, 22 Oct 2012 10:27:29 +0000

Hi,

We use ceph to store small file (lot of them) on different servers and access it using rados gateway.
Our data size is 380Go (very small). We have two host with 5 osd each.
We use small config for ceph : 2Go RAM server with 5 x 2To Disk (one OSD on each disk).
This is a very cheap config that allow us to keep our storing cost under control and it's enough to get the read performance we need.
(We use this config with mogilefs to store 150To of data)

This week-end we had an alert saying ceph was down.

After looking at the osd, we saw a very high load on osd (450 of load), some were down.
Ceph -s displayed that we were having down pg, peering+down pg, remapped pg. etc.

So we started to see that when we were peering and stuff like that, the load was very high.
OSD stop responding and we could see in the log message like :
FileStore timeout and Abort Signal

So basically the cluster was under load because we was recovering... but because it was under load recovering could not complete.

We change this params to get a longer timeout :
filestore op thread suicide timeout = 360 
filestore op thread timeout = 180 
osd default notify timeout = 360

The cluster was still under heavy load, osd was still timeouting (less timeouting but still)

So we test param to "throttle" the "recovery" process :
filestore op threads = 6
filestore queue max ops = 24
osd recovery max active = 1

Load was better, but still very high (30). 

We also try to put the journal in a tmpfs with zram.
We set noout so it won't copy files to satisfy the replicate count because osd were out.

We then updated to kernel 3.5 to get last xfs optim.

In the end nothing was working we were in the same infinite death loop of recovering => load => timeout => recovering.
So we updated from ceph 0.48.2 to 0.53, load was better and recovery finally worked.

As we don't want to be in the position again (24h downtime), I have some questions on ceph/rados.

1/ Even when we switch to ceph 0.53, the rados gateway was still not responding, Log was displaying Initalization timeout.
Is it normal that the "recovering" process kill the fact that we can read data from ceph ? 
The data is here, it is just moving, why can't we access it ?

2/ In case of very high load because ceph is moving data, is there a way to tell ceph to go slowly ?

Thanks,

--
Yann ROBIN
Société Publica
www.YouScribe.com

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html