Re: Fwd: OSD fail on client writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 02/21/2015 12:12 PM, Jeffrey McDonald wrote:

Hi,

We have a ceph Giant installation with a radosgw interface.   There are
198 OSDs on seven OSD servers and we're seeing OSD failures on the
system when users try to write files via the s3 interface.    We're more
likely to see the failures if the files are larger than 1 GB and if the
files go to a newly created bucket.   We have seen failures for older
buckets but that seem to happen less frequently.   I can regularly crash
the OSD with a 3.6 GB file writing to a newly created bucket.

Three weeks ago, we upgraded to Giant from firefly to achieve better
performance.   Under firefly it was impossible to break the system.
  We have had these issues since we've moved to giant.   We've gone
  through tests with iptables, sysctl parameters and testing different
versions of s3cmd (along with different python versions), there is no
indication that any of these matter for the failures.

Hi Jeff,

Did increasing the heartbeat grace period on the OSDs and the Monitors help at all? Any other system logging information on the OSDs that might show any interesting behavior (excessive major pagefaults, high CPU, etc)? Can you reproduce it with RADOS bench and/or RBD instead of with RGW?

From the logs we saw earlier it looks like multiple peers are claiming a lack of heartbeat after 20s from the OSD(s). I think that's either got to be a network/firewall issue or something is making the OSD heartbeat extremely laggy. That's probably where I'd focus efforts.

For posterity, another user saw something similar when transitioning from Firefly to Giant, but I'm not sure it was every resolved:

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/044727.html

The last message in the thread indicates that it may be related to deep-scrub.

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux