Re: Rados Crashing

"Brent Kennedy" <bkennedy@xxxxxxxxxx> · Thu, 12 Nov 2020 14:00:25 -0500

I didn't know there was a replacement for the radosgw role!  I saw in the
ceph-ansible project mention of a radosgw load balancer but since I use
haproxy, I didn't dig into that.  Is that what you are referring to?
Otherwise, I cant seem to find any mention of cive being replaced.

For the issue below, I guess the dev was using a single threaded process
that was out of control.  They have done it a few times now and kills all
four gateways.  I asked them to stop and so far no repeats.  For deletes,
they should be using the bucket item aging anyways.

-Brent

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Friday, October 23, 2020 7:00 AM
To: ceph-users@xxxxxxx
Subject:  Re: Rados Crashing

Hi,

I read that civetweb and radosgw have a locking issue in combination with
ssl [1], just a thought based on

> failed to acquire lock on obj_delete_at_hint.0000000079

Since Nautilus the default rgw frontend is beast, have you thought about
switching?

Regards,
Eugen

[1] https://tracker.ceph.com/issues/22951

Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx>:

> We are performing file maintenance( deletes essentially ) and when the 
> process gets to a certain point, all four rados gateways crash with 
> the
> following:
>
>
>
>
>
> Log output:
>
> -5> 2020-10-20 06:09:53.996 7f15f1543700  2 req 7 0.000s s3:delete_obj
> verifying op params
>
>     -4> 2020-10-20 06:09:53.996 7f15f1543700  2 req 7 0.000s 
> s3:delete_obj pre-executing
>
>     -3> 2020-10-20 06:09:53.996 7f15f1543700  2 req 7 0.000s 
> s3:delete_obj executing
>
>     -2> 2020-10-20 06:09:53.997 7f161758f700 10 monclient: 
> get_auth_request con 0x55d2c02ff800 auth_method 0
>
>     -1> 2020-10-20 06:09:54.009 7f1609d74700  5 process_single_shard():
> failed to acquire lock on obj_delete_at_hint.0000000079
>
>      0> 2020-10-20 06:09:54.035 7f15f1543700 -1 *** Caught signal 
> (Segmentation fault) **
>
> in thread 7f15f1543700 thread_name:civetweb-worker
>
>
>
> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
> nautilus
> (stable)
>
> 1: (()+0xf5d0) [0x7f161d3405d0]
>
> 2: (()+0x2bec80) [0x55d2bcd1fc80]
>
> 3: (std::string::assign(std::string const&)+0x2e) [0x55d2bcd2870e]
>
> 4: (rgw_bucket::operator=(rgw_bucket const&)+0x11) [0x55d2bce3e551]
>
> 5: (RGWObjManifest::obj_iterator::update_location()+0x184) 
> [0x55d2bced7114]
>
> 6: (RGWObjManifest::obj_iterator::operator++()+0x263) [0x55d2bd092793]
>
> 7: (RGWRados::update_gc_chain(rgw_obj&, RGWObjManifest&,
> cls_rgw_obj_chain*)+0x51a) [0x55d2bd0939ea]
>
> 8: (RGWRados::Object::complete_atomic_modification()+0x83) 
> [0x55d2bd093c63]
>
> 9: (RGWRados::Object::Delete::delete_obj()+0x74d) [0x55d2bd0a87ad]
>
> 10: (RGWDeleteObj::execute()+0x915) [0x55d2bd04b6d5]
>
> 11: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, 
> req_state*, bool)+0x915) [0x55d2bcdfbb35]
>
> 12: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::string 
> const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, 
> OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*)+0x1cd8) 
> [0x55d2bcdfdea8]
>
> 13: (RGWCivetWebFrontend::process(mg_connection*)+0x38e) 
> [0x55d2bcd41a1e]
>
> 14: (()+0x36bace) [0x55d2bcdccace]
>
> 15: (()+0x36d76f) [0x55d2bcdce76f]
>
> 16: (()+0x36dc18) [0x55d2bcdcec18]
>
> 17: (()+0x7dd5) [0x7f161d338dd5]
>
> 18: (clone()+0x6d) [0x7f161c84302d]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
> needed to interpret this.
>
>
>
> My guess is that we need to add more resources to the gateways?  They 
> have 2 CPUs and 12GB of memory running as virtual machines on centOS 
> 7.6 .  Any thoughts?
>
>
>
> -Brent
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx