Re: OSD commits suicide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Have you tuned any of the recovery or backfill parameters?  My ceph.conf has:
  osd max backfills = 1
  osd recovery max active = 1
  osd recovery op priority = 1

Still, if it's running for a few hours, then failing, it sounds like there might be something else at play.  OSDs use a lot of RAM during recovery.  How much RAM and how many OSDs do you have in these nodes?  What does memory usage look like after a fresh restart, and what does it look like when the problems start?  Even better if you know what it looks like 5 minutes before the problems start.

Is there anything interesting in the kernel logs?  OOM killers, or memory deadlocks?

On Sat, Nov 8, 2014 at 11:19 AM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote:

I have some OSD's that keep committing suicide. My cluster has ~1.3M
misplaced objects, and it can't really recover, because OSD's keep
failing before recovering finishes. The load on the hosts is quite high,
but the cluster currently has no other tasks than just the

I attached the logfile from a failed OSD. It shows the suicide, the
recent events and also me starting the OSD again after some time.

It'll keep running for a couple of hours and then fail again, for the
same reason.

I noticed a lot of timeouts. Apparently ceph stresses the hosts to the
limit with the recovery tasks, so much that they timeout and can't
finish that task. I don't understand why. Can I somehow throttle ceph a
bit so that it doesn't keep overrunning itself? I kinda feel like it
should chill out a bit and simply recover one step at a time instead of
full force and then fail.



ceph-users mailing list

ceph-users mailing list

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux