Re: OSD commits suicide

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Mon, 17 Nov 2014 13:54:57 -0800

I did have a problem in my secondary cluster that sounds similar to yours.  I was using XFS, and traced my problem back to 64 kB inodes (osd mkfs options xfs = -i size=64k).   This showed up with a lot of "XFS: possible memory allocation deadlock in kmem_alloc" in the kernel logs.  I was able to keep things limping along by flushing the cache frequently, but I eventually re-formatted every OSD to get rid of the 64k inodes.
After I finished the reformat, I had problems because of deep-scrubbing.  While reformatting, I disabled deep-scrubbing.  Once I re-enabled it, Ceph wanted to deep-scrub the whole cluster, and sometimes 90% of my OSDs would be doing a deep-scrub.  I'm manually deep-scrubbing now, trying to spread out the schedule a bit.  Once this finishes in a few day, I should be able to re-enable deep-scrubbing and keep my HEALTH_OK.

My primary cluster has always been well behaved.  It completed the re-format without having any problems.  The clusters are nearly identical, the biggest difference being that the secondary had a higher sustained load due to a replication backlog.

On Sat, Nov 15, 2014 at 12:38 PM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote:
Hi,

Thanks for the tip, I applied these configuration settings and it does

lower the load during rebuilding a bit. Are there settings like these

that also tune Ceph down a bit during regular operations? The slow

requests, timeouts and OSD suicides are killing me.

If I allow the cluster to regain consciousness and stay idle a bit, it

all seems to settle down nicely, but as soon as I apply some load it

immediately starts to overstress and complain like crazy.

I'm also seeing this behaviour: http://tracker.ceph.com/issues/9844

This was reported by Dmitry Smirnov 26 days ago, but the report has no

response yet. Any ideas?

In my experience, OSD's are quite unstable in Giant and very easily

stressed, causing chain effects, further worsening the issues. It would

be nice to know if this is also noticed by other users?

Thanks,

Erik.

On 11/10/2014 08:40 PM, Craig Lewis wrote:

> Have you tuned any of the recovery or backfill parameters?  My ceph.conf

> has:

> [osd]

>   osd max backfills = 1

>   osd recovery max active = 1

>   osd recovery op priority = 1

>

> Still, if it's running for a few hours, then failing, it sounds like

> there might be something else at play.  OSDs use a lot of RAM during

> recovery.  How much RAM and how many OSDs do you have in these nodes?

> What does memory usage look like after a fresh restart, and what does it

> look like when the problems start?  Even better if you know what it

> looks like 5 minutes before the problems start.

>

> Is there anything interesting in the kernel logs?  OOM killers, or

> memory deadlocks?

>

>

>

> On Sat, Nov 8, 2014 at 11:19 AM, Erik Logtenberg <erik@xxxxxxxxxxxxx

> <mailto:erik@xxxxxxxxxxxxx>> wrote:

>

>     Hi,

>

>     I have some OSD's that keep committing suicide. My cluster has ~1.3M

>     misplaced objects, and it can't really recover, because OSD's keep

>     failing before recovering finishes. The load on the hosts is quite high,

>     but the cluster currently has no other tasks than just the

>     backfilling/recovering.

>

>     I attached the logfile from a failed OSD. It shows the suicide, the

>     recent events and also me starting the OSD again after some time.

>

>     It'll keep running for a couple of hours and then fail again, for the

>     same reason.

>

>     I noticed a lot of timeouts. Apparently ceph stresses the hosts to the

>     limit with the recovery tasks, so much that they timeout and can't

>     finish that task. I don't understand why. Can I somehow throttle ceph a

>     bit so that it doesn't keep overrunning itself? I kinda feel like it

>     should chill out a bit and simply recover one step at a time instead of

>     full force and then fail.

>

>     Thanks,

>

>     Erik.

>

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com