Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good call. I just restarted the whole cluster, but the problem still
persists.
I don't think it is a problem with the rados, but with the radosgw.

But I still struggle to pin the issue.

Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider <
Thomas.Schneider-q2p@xxxxxxxxxxxxxxxxxx>:

> Hey all,
>
> we had slow RGW access when some OSDs were slow due to an (to us) unknown
> OSD bug that made PG access either slow or impossible. (It showed itself
> through slowness of the mgr as well, but nothing other than that).
> We restarted all OSDs that held RGW data and the problem was gone.
> I have no good way to debug the problem since it never occured again after
> we restarted the OSDs.
>
> Kind regards,
> Thomas
>
>
> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens <bb@xxxxxxxxx>:
> >Hi Amit,
> >
> >I just pinged the mons from every system and they are all available.
> >
> >Am Mo., 10. Mai 2021 um 21:18 Uhr schrieb Amit Ghadge <
> amitg.b14@xxxxxxxxx>:
> >
> >> We seen slowness due to unreachable one of them mgr service, maybe here
> >> are different, you can check monmap/ ceph.conf mon entry and then verify
> >> all nodes are successfully ping.
> >>
> >>
> >> -AmitG
> >>
> >>
> >> On Tue, 11 May 2021 at 12:12 AM, Boris Behrens <bb@xxxxxxxxx> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> does someone got any idea?
> >>>
> >>> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>:
> >>>
> >>> > Hi,
> >>> > since a couple of days we experience a strange slowness on some
> >>> > radosgw-admin operations.
> >>> > What is the best way to debug this?
> >>> >
> >>> > For example creating a user takes over 20s.
> >>> > [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user
> >>> > --display-name=test-bb-user
> >>> > 2021-05-05 14:08:14.297 7f6942286840  1 robust_notify: If at first
> you
> >>> > don't succeed: (110) Connection timed out
> >>> > 2021-05-05 14:08:14.297 7f6942286840  0 ERROR: failed to distribute
> >>> cache
> >>> > for eu-central-1.rgw.users.uid:test-bb-user
> >>> > 2021-05-05 14:08:24.335 7f6942286840  1 robust_notify: If at first
> you
> >>> > don't succeed: (110) Connection timed out
> >>> > 2021-05-05 14:08:24.335 7f6942286840  0 ERROR: failed to distribute
> >>> cache
> >>> > for eu-central-1.rgw.users.keys:****
> >>> > {
> >>> >     "user_id": "test-bb-user",
> >>> >     "display_name": "test-bb-user",
> >>> >    ....
> >>> > }
> >>> > real 0m20.557s
> >>> > user 0m0.087s
> >>> > sys 0m0.030s
> >>> >
> >>> > First I thought that rados operations might be slow, but adding and
> >>> > deleting objects in rados are fast as usual (at least from my
> >>> perspective).
> >>> > Also uploading to buckets is fine.
> >>> >
> >>> > We changed some things and I think it might have to do with this:
> >>> > * We have a HAProxy that distributes via leastconn between the 3
> >>> radosgw's
> >>> > (this did not change)
> >>> > * We had three times a daemon with the name "eu-central-1" running
> (on
> >>> the
> >>> > 3 radosgw's)
> >>> > * Because this might have led to our data duplication problem, we
> have
> >>> > split that up so now the daemons are named per host
> (eu-central-1-s3db1,
> >>> > eu-central-1-s3db2, eu-central-1-s3db3)
> >>> > * We also added dedicated rgw daemons for garbage collection, because
> >>> the
> >>> > current one were not able to keep up.
> >>> > * So basically ceph status went from "rgw: 1 daemon active
> >>> (eu-central-1)"
> >>> > to "rgw: 14 daemons active (eu-central-1-s3db1, eu-central-1-s3db2,
> >>> > eu-central-1-s3db3, gc-s3db12, gc-s3db13...)
> >>> >
> >>> >
> >>> > Cheers
> >>> >  Boris
> >>> >
> >>>
> >>>
> >>> --
> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> im
> >>> groüen Saal.
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >>
> >
>
> --
> Thomas Schneider
> IT.SERVICES
> Wissenschaftliche Informationsversorgung Ruhr-Universität Bochum | 44780
> Bochum
> Telefon: +49 234 32 23939
> http://www.it-services.rub.de/
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux