Re: civitweb segfaults

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That did the trick, we had it set to 0 just on the swift rgw definitions although it was set on other rgw services, I'm guessing someone must have thought there was a different precedence in play in the past.

On Tue, 2018-12-11 at 11:41 -0500, Casey Bodley wrote:
Hi Leon,

Are you running with a non-default value of rgw_gc_max_objs? I was able 
to reproduce this exact stack trace by setting rgw_gc_max_objs = 0; I 
can't think of any other way to get a 'Floating point exception' here.

On 12/11/18 10:31 AM, Leon Robinson wrote:
Hello, I have found a surefire way to bring down our swift gateways.

First, upload a bunch of large files and split it in to segments, e.g.

for i in {1..100}; do swift upload test_container -S 10485760 
CentOS-7-x86_64-GenericCloud.qcow2 --object-name 
CentOS-7-x86_64-GenericCloud.qcow2-$i; done

This creates 100 objects in test_container and 1000 or so objects in 
test_container_segments

Then, Delete them. Preferably in a ludicrous manner.

for i in $(swift list test_container); do swift delete test_container 
$i; done

What results is:

 -13> 2018-12-11 15:17:57.627655 7fc128b49700  1 -- 
172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 1 
==== osd_op_reply(11 .dir.default.1083413551.2.7 [call,call] 
v1423252'7548804 uv7548804 _ondisk_ = 0) v8 ==== 213+0+0 (3895049453 0 
0) 0x55c98f45e9c0 con 0x55c98f4d7800
   -12> 2018-12-11 15:17:57.627827 7fc0e3ffe700  1 -- 
172.28.196.121:0/464072497 --> 172.26.221.7:6816/2366816 -- 
osd_op(unknown.0.0:12 14.110b 
14:d08c26b8:::default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10%2f1532606905.440697%2f938016768%2f10485760%2f00000037:head 
[cmpxattr user.rgw.idtag (25) op 1 mode 1,call rgw.obj_remove] snapc 
0=[] ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f4603c0 con 0
   -11> 2018-12-11 15:17:57.628582 7fc128348700  5 -- 
172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653 
conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=540 cs=1 l=1). rx osd.87 seq 2 0x55c98f4603c0 osd_op_reply(340 
obj_delete_at_hint.0000000055 [call] v1423252'9217746 uv9217746 ondisk 
= 0) v8
   -10> 2018-12-11 15:17:57.628604 7fc128348700  1 -- 
172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 2 ==== 
osd_op_reply(340 obj_delete_at_hint.0000000055 [call] v1423252'9217746 
uv9217746 _ondisk_ = 0) v8 ==== 173+0+0 (3971813511 0 0) 0x55c98f4603c0 
con 0x55c98f0eb000
    -9> 2018-12-11 15:17:57.628760 7fc1017f9700  1 -- 
172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 -- 
osd_op(unknown.0.0:341 13.4f 
13:f3db1134:::obj_delete_at_hint.0000000055:head [call timeindex.list] 
snapc 0=[] ondisk+read+known_if_redirected e1423252) v8 -- 
0x55c98f45fa00 con 0
    -8> 2018-12-11 15:17:57.629306 7fc128348700  5 -- 
172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653 
conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=540 cs=1 l=1). rx osd.87 seq 3 0x55c98f45fa00 osd_op_reply(341 
obj_delete_at_hint.0000000055 [call] v0'0 uv9217746 _ondisk_ = 0) v8
    -7> 2018-12-11 15:17:57.629326 7fc128348700  1 -- 
172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 3 ==== 
osd_op_reply(341 obj_delete_at_hint.0000000055 [call] v0'0 uv9217746 
_ondisk_ = 0) v8 ==== 173+0+15 (3272189389 0 2149983739) 0x55c98f45fa00 
con 0x55c98f0eb000
    -6> 2018-12-11 15:17:57.629398 7fc128348700  5 -- 
172.28.196.121:0/464072497 >> 172.26.221.7:6816/2366816 
conn(0x55c98f4d6000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=181 cs=1 l=1). rx osd.58 seq 2 0x55c98f45fa00 osd_op_reply(12 
default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/00000037 
[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 _ondisk_ = 0) v8
    -5> 2018-12-11 15:17:57.629418 7fc128348700  1 -- 
172.28.196.121:0/464072497 <== osd.58 172.26.221.7:6816/2366816 2 ==== 
osd_op_reply(12 
default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/00000037 
[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 _ondisk_ = 0) 
v8 ==== 290+0+0 (3763879162 0 0) 0x55c98f45fa00 con 0x55c98f4d6000
    -4> 2018-12-11 15:17:57.629458 7fc1017f9700  1 -- 
172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 -- 
osd_op(unknown.0.0:342 13.4f 
13:f3db1134:::obj_delete_at_hint.0000000055:head [call lock.unlock] 
snapc 0=[] ondisk+write+known_if_redirected e1423252) v8 -- 
0x55c98f45fd40 con 0
    -3> 2018-12-11 15:17:57.629603 7fc0e3ffe700  1 -- 
172.28.196.121:0/464072497 --> 172.26.212.6:6802/2058882 -- 
osd_op(unknown.0.0:13 15.1e0 
15:079bdcbb:::.dir.default.1083413551.2.7:head [call 
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[] 
ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f460700 con 0
    -2> 2018-12-11 15:17:57.631312 7fc128b49700  5 -- 
172.28.196.121:0/464072497 >> 172.26.212.6:6802/2058882 
conn(0x55c98f4d7800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=202 cs=1 l=1). rx osd.480 seq 2 0x55c98f460700 osd_op_reply(13 
.dir.default.1083413551.2.7 [call,call] v1423252'7548805 uv7548805 
_ondisk_ = 0) v8
    -1> 2018-12-11 15:17:57.631329 7fc128b49700  1 -- 
172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 2 
==== osd_op_reply(13 .dir.default.1083413551.2.7 [call,call] 
v1423252'7548805 uv7548805 _ondisk_ = 0) v8 ==== 213+0+0 (4216487267 0 
0) 0x55c98f460700 con 0x55c98f4d7800
     0> 2018-12-11 15:17:57.631834 7fc0e3ffe700 -1 *** Caught signal 
(Floating point exception) **
 in thread 7fc0e3ffe700 thread_name:civetweb-worker

 ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) 
luminous (stable)
 1: (()+0x200024) [0x55c98cc95024]
 2: (()+0x11390) [0x7fc13e474390]
 3: (RGWGC::tag_index(std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&)+0x56) 
[0x55c98cf78cc6]
 4: (RGWGC::send_chain(cls_rgw_obj_chain&, 
std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&, bool)+0x6a) [0x55c98cf7b06a]
 5: (RGWRados::Object::complete_atomic_modification()+0xd3) 
[0x55c98cdbfb63]
 6: (RGWRados::Object::Delete::delete_obj()+0xa22) [0x55c98cdf4142]
 7: (RGWDeleteObj::execute()+0x46c) [0x55c98cd8802c]
 8: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, 
req_state*, bool)+0x165) [0x55c98cdb01c5]
 9: (process_request(RGWRados*, RGWREST*, RGWRequest*, 
std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, 
RGWRestfulIO*, OpsLogSocket*, int*)+0x1dbc) [0x55c98cdb234c]
 10: (RGWCivetWebFrontend::process(mg_connection*)+0x38f) [0x55c98cc4aacf]
 11: (()+0x1f05d9) [0x55c98cc855d9]
 12: (()+0x1f1fa9) [0x55c98cc86fa9]
 13: (()+0x76ba) [0x7fc13e46a6ba]
 14: (clone()+0x6d) [0x7fc133b5941d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 0 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/radosgw_swift.log


Which isn't great. We can restart the radosgw but then anyone else who 
fancies deleting a large segmented object can kill our service.

Any ideas?

-- 
Leon L. Robinson <
leon.robinson@xxxxxxxxxxxx
 
<mailto:
leon.robinson@xxxxxxxxxxxx
>>

------------------------------------------------------------------------

NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the 
above-named person(s). If you are not the intended recipient, notify 
the sender immediately, delete this email from your system and do not 
disclose or use for any purpose. We may monitor all incoming and 
outgoing emails in line with current legislation. We have taken steps 
to ensure that this email and attachments are free from any virus, but 
it remains your responsibility to ensure that viruses do not adversely 
affect you

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux