Does someone have an idea what I can check, maybe what logs I can turn on, to find the cause of the problem? Or at least can have a monitoring that tells me when this happens? Currently I go through ALL of the buckets and basically do a "compare bucket index to radoslist" for all objects in the bucket index. But I doubt this will give me new insights. Am Mo., 21. Nov. 2022 um 11:55 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>: > Good day people, > > we have a very strange problem with some bucket. > Customer informed us, that they had issues with objects. They are listed, > but on a GET they receive "NoSuchKey" error. > They did not delete anything from the bucket. > > We checked and `radosgw-admin bucket radoslist --bucket $BUCKET` was > empty, but all the objects were still listed in the `radosgw-admin bi list > --bucket`. > > The date when they noticed, the cluster was as healthy as it can get in > our case. There were also no other tasks performed, including orphan > objects search, resharding of buckets, adding or removing OSDs, rebalancing > and so on. > > Some data about the cluster: > > - 275 OSDs (38 SSD OSDs, 6 SSD OSDs reserved for GC, rest 8-16TB > spinning HDD) over 13 hosts > - SSD for block.db every 5 HDD OSDs > - The SSDs are 100GB LVs on our block.db SSDs and contain all the > pools that are not rgw.buckets.data and rgw.buckets.non-ec > - The garbage collector is on separate SSDs OSDs, which are als 100GB > LVs on our block.db SSDs > - We had to split of the GC from all other pools, because this bug ( > https://tracker.ceph.com/issues/53585) lead to problems, where we > received 500s errors, from RGW > - We have three HAProxy frontends, each pointing to one of our RGW > instances (with the other two RGW daemons as fallback) > - We have 12 RGW daemons running in total, but only three of them are > connected to the outside world (3x only for GC, 3x for some zonegroup > restructuring, 3x for a dedicated customer with own pools) > - We have multiple zonegroups with one zone each. We only replicate > the metadata, so bucket names are unique and users get synced. > > > > Our ceph.conf: > > - I replaced IP addresses, FSID, and domains > - the -old RGW are meant to get replaced, because we have a naming > conflict (all zonegroups are in one TLD and are separated by subdomain, but > the initial RGW is still available via TLD and not via subdomain.tld) > > > [global] > fsid = $FSID > ms_bind_ipv6 = true > ms_bind_ipv4 = false > mon_initial_members = s3db1, s3db2, s3db3 > mon_host = [$s3b1-IPv6-public_network],[$s3b2-IPv6 > -public_network],[$s3b2-IPv6-public_network] > auth_cluster_required = none > auth_service_required = none > auth_client_required = none > public_network = $public_network/64 > #cluster_network = $cluster_network/64 > > [mon.s3db1] > host = s3db1 > mon addr = [$s3b1-IPv6-public_network]:6789 > > [mon.s3db2] > host = s3db2 > mon addr = [$s3b2-IPv6-public_network]:6789 > > [mon.s3db3] > host = s3db3 > mon addr = [$s3b3-IPv6-public_network]:6789 > > [client] > rbd_cache = true > rbd_cache_size = 64M > rbd_cache_max_dirty = 48M > rgw_print_continue = true > rgw_enable_usage_log = true > rgw_resolve_cname = true > rgw_enable_apis = s3,admin,s3website > rgw_enable_static_website = true > rgw_trust_forwarded_https = true > > [client.gc-s3db1] > rgw_frontends = "beast endpoint=[::1]:7489" > #rgw_gc_processor_max_time = 1800 > #rgw_gc_max_concurrent_io = 20 > > [client.eu-central-1-s3db1] > rgw_frontends = beast endpoint=[::]:7482 > rgw_region = eu > rgw_zone = eu-central-1 > rgw_dns_name = name.example.com > rgw_dns_s3website_name = s3-website-name.example.com > rgw_thread_pool_size = 512 > > [client.eu-central-1-s3db1-old] > rgw_frontends = beast endpoint=[::]:7480 > rgw_region = eu > rgw_zone = eu-central-1 > rgw_dns_name = example.com > rgw_dns_s3website_name = eu-central-1.example.com > rgw_thread_pool_size = 512 > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx