Re: One osd crashing daily, the problem with osd.50

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Mon, 9 May 2016 10:14:43 +0200

thanks for your commends. answers inline.

On 05/09/16 09:53, Christian Balzer wrote:

Hello,

On Mon, 9 May 2016 09:31:20 +0200 Ronny Aasen wrote:

hello

I am running a small lab ceph cluster consisting of 6 old used servers.

That's larger than quite a few production deployments. ^_-

:)

they have 36 slots for drives. but too little ram, 32GB max, for this
mainboard, to take advantage of them all. When i get to around 20 osd's
on a node the OOM killer becomes a problem, if there is incidents that
require recovery.

No surprise there, if you're limited to that little RAM I suspect you'd
run out of CPU power with a full load, too.

cpu have not been an issue yet. but i suspect it might change when i can 
release enough ram to try erasure coding.

In order to remedy some of the ram problems i am running the osd's on 5
disk raid5 software sets. this gives me about 7 12TB osd's on a node and
a global hotspare. I have tried this on one of the nodes with good
success. and I am in the process of doing the migrations on the other
nodes as well.

That's optimizing for space and nothing else.

Having done something similar in the past I would strongly recommend the
following:
a) Use RAID6, so that you never have to worry about an OSD failure.
I've personally lost 2 RAID5 sets of similar size due to double disk
failures.

b) use RAID10 for much improved performance (IOPS). To offset the loss in
space, consider running with a replication of 2, which would be safe, same
for option a).

yes definitivly optimizing for space, and space only. that's also the 
reason for raid5 and not raid6 or 10.
and why erasure coding is a must have if i can get it there.

i am running on debian jessie using the 0.94.6 hammer from ceph's repo.

but a issue has started appering on one of these raid5 osd's

osd.50 have a tendency to stop ~daily with the error message seen in the
log below. The osd is running on a healthy software raid5 disk, and i
can see nothing in dmesg or any other log that can indicate a problem
with this md device.

The key part of that log is EIO failed assert, if you google for

"FAILED assert(allow_eio" you will get hits from last year, this is FS
issue and has nothing to do with the RAID per se.

Which FS are you using?

If it's not BTRFS and since your other OSDs are not having issues, it
might be worth going over this FS with a fine comb.

The "near full" OSD is something that you want to address, too.

the near full is allready fixed.
am running on XFS. i had assumed that FS issues also would show in 
system logs. Ill google for this issue.

once i restart the osd it's up and in and probably stays up and in for
some hours upto a few days. the other 6 osd's on this node does not show
the same problem. i have restarted this osd about 8-10 times. so it's
fairly regular.

Might have to bite the bullet and re-create it if you can't find the issue.

thanks. I'll do this today if nobody (or google) have any better suggestions

the raid5 sets are 12TB so i was hoping to be able to fix the problem,
rather then zapping the md and recreating from scratch. I was also
worrying if there was something fundamentaly wrong about running osd's
on software md raid5 devices.

No problem in and by itself, other than reduced performance.

Knid regards
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com