It turned out that under heavy load one of the rgws have had too low soft NOFILE limit set (max number of open files). We also accidentally found that in our setup and hw the "round trips" under heavy load from frontend proxy (nginx in our case) to radosgw cost is pretty high (let us say 50% in capability of sent p/s), even if it is on the same machine and ip. Thanks anyway. On Wed, Jul 24, 2024 at 8:25 AM Eugen Block <eblock@xxxxxx> wrote: > Hi, > > can you tell a bit more about your setup? Are RGWs and OSDs colocated > on the same servers? Are there any signs of a server overload like OOM > killers or anything else related to the recovery? Are disks saturated? > Is this cephadm managed? What's the current ceph status? > > Thanks, > Eugen > > Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>: > > > Hi, > > > > we've just updated from pacific(16.2.15) to quincy(17.2.7) and everything > > seems to work, however after some time radosgw stops responding and we > have > > to restart it. > > > > At first look, it seems that radosgw stops responding sometimes during > > recovery. > > > > Does this maybe have to do something with mclock > > https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/ ? > > > > ceph.conf looks something like this: > > > > -------- > > > > [global] > > fsid = ... > > mon initial members = x1,x2 > > mon host = > > # public network = 192.168.0.0/24 > > auth cluster required = none > > auth service required = none > > auth client required = none > > ms_mon_client_mode = crc > > > > osd journal size = 1024 > > osd pool default size = 3 > > osd pool default min size = 2 > > osd pool default pg num = 128 > > osd pool default pgp num = 128 > > osd crush chooseleaf type = 1 > > > > [osd] > > osd_scrub_begin_hour = 18 > > osd_scrub_end_hour = 6 > > osd_class_update_on_start = false > > osd_scrub_during_recovery = false #scrub during recovery > > osd_scrub_max_interval = 1209600 > > osd_deep_scrub_interval = 1209600 > > osd_max_scrubs = 3 > > osd_scrub_load_threshold = 1 > > > > [client.radosgw.mon2] > > host = mon2 > > # keyring = /etc/ceph/ceph.client.radosgw.keyring > > log_file = /var/log/ceph/client.radosgw.mon2.log > > rgw_dns_name = ... > > rgw_frontends = "beast port=4444" > > rgw_max_put_param_size = 15728640 > > rgw_crypt_require_ssl = false > > rgw_max_concurrent_requests = 2048 > > -------- > > > > We have nginx in front of rgw which upstream is set to > client.radosgw.mon2 > > port 4444. > > > > Kind regards, > > Rok > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx