Good day people, we have a very strange problem with some bucket. Customer informed us, that they had issues with objects. They are listed, but on a GET they receive "NoSuchKey" error. They did not delete anything from the bucket. We checked and `radosgw-admin bucket radoslist --bucket $BUCKET` was empty, but all the objects were still listed in the `radosgw-admin bi list --bucket`. The date when they noticed, the cluster was as healthy as it can get in our case. There were also no other tasks performed, including orphan objects search, resharding of buckets, adding or removing OSDs, rebalancing and so on. Some data about the cluster: - 275 OSDs (38 SSD OSDs, 6 SSD OSDs reserved for GC, rest 8-16TB spinning HDD) over 13 hosts - SSD for block.db every 5 HDD OSDs - The SSDs are 100GB LVs on our block.db SSDs and contain all the pools that are not rgw.buckets.data and rgw.buckets.non-ec - The garbage collector is on separate SSDs OSDs, which are als 100GB LVs on our block.db SSDs - We had to split of the GC from all other pools, because this bug ( https://tracker.ceph.com/issues/53585) lead to problems, where we received 500s errors, from RGW - We have three HAProxy frontends, each pointing to one of our RGW instances (with the other two RGW daemons as fallback) - We have 12 RGW daemons running in total, but only three of them are connected to the outside world (3x only for GC, 3x for some zonegroup restructuring, 3x for a dedicated customer with own pools) - We have multiple zonegroups with one zone each. We only replicate the metadata, so bucket names are unique and users get synced. Our ceph.conf: - I replaced IP addresses, FSID, and domains - the -old RGW are meant to get replaced, because we have a naming conflict (all zonegroups are in one TLD and are separated by subdomain, but the initial RGW is still available via TLD and not via subdomain.tld) [global] fsid = $FSID ms_bind_ipv6 = true ms_bind_ipv4 = false mon_initial_members = s3db1, s3db2, s3db3 mon_host = [$s3b1-IPv6-public_network],[$s3b2-IPv6 -public_network],[$s3b2-IPv6-public_network] auth_cluster_required = none auth_service_required = none auth_client_required = none public_network = $public_network/64 #cluster_network = $cluster_network/64 [mon.s3db1] host = s3db1 mon addr = [$s3b1-IPv6-public_network]:6789 [mon.s3db2] host = s3db2 mon addr = [$s3b2-IPv6-public_network]:6789 [mon.s3db3] host = s3db3 mon addr = [$s3b3-IPv6-public_network]:6789 [client] rbd_cache = true rbd_cache_size = 64M rbd_cache_max_dirty = 48M rgw_print_continue = true rgw_enable_usage_log = true rgw_resolve_cname = true rgw_enable_apis = s3,admin,s3website rgw_enable_static_website = true rgw_trust_forwarded_https = true [client.gc-s3db1] rgw_frontends = "beast endpoint=[::1]:7489" #rgw_gc_processor_max_time = 1800 #rgw_gc_max_concurrent_io = 20 [client.eu-central-1-s3db1] rgw_frontends = beast endpoint=[::]:7482 rgw_region = eu rgw_zone = eu-central-1 rgw_dns_name = name.example.com rgw_dns_s3website_name = s3-website-name.example.com rgw_thread_pool_size = 512 [client.eu-central-1-s3db1-old] rgw_frontends = beast endpoint=[::]:7480 rgw_region = eu rgw_zone = eu-central-1 rgw_dns_name = example.com rgw_dns_s3website_name = eu-central-1.example.com rgw_thread_pool_size = 512 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx