Re: radosgw crashing after buffer overflows detected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I found a couple OSDs that were seeing medium errors and marked them out
of the cluster.  Once all the PGs were moved off those OSDs all the
buffer overflows went away.

So there must be some kind of bug that's being triggered when an OSD is
misbehaving.

Bryan

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Bryan Stillwell <bstillwell@xxxxxxxxxxx>
Date: Friday, September 8, 2017 at 9:26 AM
To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject:  radosgw crashing after buffer overflows detected

[This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]

For about a week we've been seeing a decent number of buffer overflows
detected across all our RGW nodes in one of our clusters.  This started
happening a day after we started weighing in some new OSD nodes, so
we're thinking it's probably related to that.  Could someone help us
determine the root cause of this?

Cluster details:
  Distro: CentOS 7.2
  Release: 0.94.10-0.el7.x86_64
  OSDs: 1120
  RGW nodes: 10

See log messages below.  If you know how to improve the call trace
below I would like to hear that too.  I tried installing the
ceph-debuginfo-0.94.10-0.el7.x86_64 package, but that didn't seem to
help.

Thanks,
Bryan


# From /var/log/messages:

Sep  7 20:06:11 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  7 21:01:55 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  7 21:37:00 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  7 23:14:54 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  7 23:17:08 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  8 00:12:39 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  8 07:04:07 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  8 07:17:49 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  8 07:41:39 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated
Sep  8 07:59:29 p3cephrgw003 radosgw: *** buffer overflow detected ***: /bin/radosgw terminated


# From /var/log/ceph/client.radosgw.p3cephrgw003.log:

     0> 2017-09-08 07:59:29.696615 7f7b296a2700 -1 *** Caught signal (Aborted) **
in thread 7f7b296a2700

ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
1: /bin/radosgw() [0x6d3d92]
2: (()+0xf100) [0x7f7f425e9100]
3: (gsignal()+0x37) [0x7f7f4141d5f7]
4: (abort()+0x148) [0x7f7f4141ece8]
5: (()+0x75317) [0x7f7f4145d317]
6: (__fortify_fail()+0x37) [0x7f7f414f5ac7]
7: (()+0x10bc80) [0x7f7f414f3c80]
8: (()+0x10da37) [0x7f7f414f5a37]
9: (OS_Accept()+0xc1) [0x7f7f435bd8b1]
10: (FCGX_Accept_r()+0x9c) [0x7f7f435bb91c]
11: (RGWFCGXProcess::run()+0x7bf) [0x58136f]
12: (RGWProcessControlThread::entry()+0xe) [0x5821fe]
13: (()+0x7dc5) [0x7f7f425e1dc5]
14: (clone()+0x6d) [0x7f7f414de21d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux