Re: Fwd: OSD fail on client writes

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 23 Feb 2015 11:21:36 -0600

On 02/21/2015 12:12 PM, Jeffrey McDonald wrote:

Hi,

We have a ceph Giant installation with a radosgw interface.   There are
198 OSDs on seven OSD servers and we're seeing OSD failures on the
system when users try to write files via the s3 interface.    We're more
likely to see the failures if the files are larger than 1 GB and if the
files go to a newly created bucket.   We have seen failures for older
buckets but that seem to happen less frequently.   I can regularly crash
the OSD with a 3.6 GB file writing to a newly created bucket.

Three weeks ago, we upgraded to Giant from firefly to achieve better
performance.   Under firefly it was impossible to break the system.
  We have had these issues since we've moved to giant.   We've gone
  through tests with iptables, sysctl parameters and testing different
versions of s3cmd (along with different python versions), there is no
indication that any of these matter for the failures.

Hi Jeff,

Did increasing the heartbeat grace period on the OSDs and the Monitors 
help at all?  Any other system logging information on the OSDs that 
might show any interesting behavior (excessive major pagefaults, high 
CPU, etc)?  Can you reproduce it with RADOS bench and/or RBD instead of 
with RGW?

From the logs we saw earlier it looks like multiple peers are claiming 
a lack of heartbeat after 20s from the OSD(s).  I think that's either 
got to be a network/firewall issue or something is making the OSD 
heartbeat extremely laggy.  That's probably where I'd focus efforts.

For posterity, another user saw something similar when transitioning 
from Firefly to Giant, but I'm not sure it was every resolved:

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/044727.html

The last message in the thread indicates that it may be related to 
deep-scrub.

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com