I actually WAS the amount of watchers... narf.. This is so embarissing.. Thanks a lot for all your input. Am Di., 11. Mai 2021 um 13:54 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>: > I tried to debug it with --debug-ms=1. > Maybe someone could help me to wrap my head around it? > https://pastebin.com/LD9qrm3x > > > > Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>: > >> Good call. I just restarted the whole cluster, but the problem still >> persists. >> I don't think it is a problem with the rados, but with the radosgw. >> >> But I still struggle to pin the issue. >> >> Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider < >> Thomas.Schneider-q2p@xxxxxxxxxxxxxxxxxx>: >> >>> Hey all, >>> >>> we had slow RGW access when some OSDs were slow due to an (to us) >>> unknown OSD bug that made PG access either slow or impossible. (It showed >>> itself through slowness of the mgr as well, but nothing other than that). >>> We restarted all OSDs that held RGW data and the problem was gone. >>> I have no good way to debug the problem since it never occured again >>> after we restarted the OSDs. >>> >>> Kind regards, >>> Thomas >>> >>> >>> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens <bb@xxxxxxxxx>: >>> >Hi Amit, >>> > >>> >I just pinged the mons from every system and they are all available. >>> > >>> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge < >>> amitg.b14@xxxxxxxxx>: >>> > >>> >> We seen slowness due to unreachable one of them mgr service, maybe >>> here >>> >> are different, you can check monmap/ ceph.conf mon entry and then >>> verify >>> >> all nodes are successfully ping. >>> >> >>> >> >>> >> -AmitG >>> >> >>> >> >>> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens <bb@xxxxxxxxx> wrote: >>> >> >>> >>> Hi guys, >>> >>> >>> >>> does someone got any idea? >>> >>> >>> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens <bb@xxxxxxxxx >>> >: >>> >>> >>> >>> > Hi, >>> >>> > since a couple of days we experience a strange slowness on some >>> >>> > radosgw-admin operations. >>> >>> > What is the best way to debug this? >>> >>> > >>> >>> > For example creating a user takes over 20s. >>> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user >>> >>> > --display-name=test-bb-user >>> >>> > 2021-05-05 14:08:14.297 7f6942286840 1 robust_notify: If at first >>> you >>> >>> > don't succeed: (110) Connection timed out >>> >>> > 2021-05-05 14:08:14.297 7f6942286840 0 ERROR: failed to distribute >>> >>> cache >>> >>> > for eu-central-1.rgw.users.uid:test-bb-user >>> >>> > 2021-05-05 14:08:24.335 7f6942286840 1 robust_notify: If at first >>> you >>> >>> > don't succeed: (110) Connection timed out >>> >>> > 2021-05-05 14:08:24.335 7f6942286840 0 ERROR: failed to distribute >>> >>> cache >>> >>> > for eu-central-1.rgw.users.keys:**** >>> >>> > { >>> >>> > "user_id": "test-bb-user", >>> >>> > "display_name": "test-bb-user", >>> >>> > .... >>> >>> > } >>> >>> > real 0m20.557s >>> >>> > user 0m0.087s >>> >>> > sys 0m0.030s >>> >>> > >>> >>> > First I thought that rados operations might be slow, but adding and >>> >>> > deleting objects in rados are fast as usual (at least from my >>> >>> perspective). >>> >>> > Also uploading to buckets is fine. >>> >>> > >>> >>> > We changed some things and I think it might have to do with this: >>> >>> > * We have a HAProxy that distributes via leastconn between the 3 >>> >>> radosgw's >>> >>> > (this did not change) >>> >>> > * We had three times a daemon with the name "eu-central-1" running >>> (on >>> >>> the >>> >>> > 3 radosgw's) >>> >>> > * Because this might have led to our data duplication problem, we >>> have >>> >>> > split that up so now the daemons are named per host >>> (eu-central-1-s3db1, >>> >>> > eu-central-1-s3db2, eu-central-1-s3db3) >>> >>> > * We also added dedicated rgw daemons for garbage collection, >>> because >>> >>> the >>> >>> > current one were not able to keep up. >>> >>> > * So basically ceph status went from "rgw: 1 daemon active >>> >>> (eu-central-1)" >>> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2, >>> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...) >>> >>> > >>> >>> > >>> >>> > Cheers >>> >>> > Boris >>> >>> > >>> >>> >>> >>> >>> >>> -- >>> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal >>> abweichend im >>> >>> groüen Saal. >>> >>> _______________________________________________ >>> >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >>> >>> >> >>> > >>> >>> -- >>> Thomas Schneider >>> IT.SERVICES >>> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780 >>> Bochum >>> Telefon: +49 234 32 23939 >>> http://www.it-services.rub.de/ >>> >> >> >> -- >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im >> groüen Saal. >> > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx