RE: Osd load and timeout when recovering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>
>> After looking at the osd, we saw a very high load on osd (450 of load), some were down.
>> Ceph -s displayed that we were having down pg, peering+down pg, remapped pg. etc.
>>
>
>Could you tell us a bit more?
>
When the load was 450, was this mainly due to disk I/O wait?
Did the machines start to swap?

All disk were 100% busy. And server was swapping.

> Could it be that the swapping was actually causing the machines to die even more?

> Although a OSD could run with 100M of memory, during recovery it can grow quite fast.

Is there a way to estimate the needed memory ?

>
> So basically the cluster was under load because we was recovering... but because it was under load recovering could not complete.
>
>
>FileStore aborts indicate that it couldn't get the work done quickly enough. I've seen this with btrfs, but you say you are using XFS.
>
>You say you are storing small files. What exactly is "small"?

In average 120ko.


-- 
Yann ROBIN
www.YouScribe.com



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux