RGW bucket notifications stop working after a while and blocking requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we just set up 2 new ceph clusters (using rook). To do some processing of the user activity we configured a topic that sends events to Kafka.

After 5-12 hours this stops working with a 503 SlowDown response:
debug 2024-08-02T09:17:58.205+0000 7ff4359ad700 1 req 13681579273117692719 0.005000019s ERROR: failed to reserve notification on queue: private.rgw. error: -28

First thought would be that the queue is full but up to this point see messages coming into Kafka and without much activity on the RGW itself (only a few requests against the S3 API) so it can’t be a load issue.

What helps is to remove the notification configuration on the buckets (put-bucket-notification-configuration). If we directly re-add the previous notification configuration it also continuous working for a few hours before failing again with the same error/behaviour.

We haven’t been able to reproduce this if we disable persistence for the topic so it looks like it is related to the persistence option - otherwise there would be also no queuing of the event for sending to Kafka.
This also suggests that the issue is not with Kafka - this is also what we suspected first e.g. it can’t handle the amount of messages etc.

Does anyone else have or had this issue and found the cause or a suggestion on how to best continue debugging? Are there detailed metrics etc. on the size and usage of the event queue?


Here is the configuration for the topic and for a bucket:

$ radosgw-admin topic list
{
   "topics": [
       {
           "user": "",
           "name": "private.rgw",
           "dest": {
               "push_endpoint": "kafka://rgw-sasl-kafka-user:XXX@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512",
               "push_endpoint_args": "OpaqueData=&Version=2010-03-31&kafka-ack-level=broker&persistent=false&push-endpoint=kafka://rgw-sasl-kafka-user:XXX@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:9094/private.rgw?sasl.mechanism=SCRAM-SHA-512&mechanism=SCRAM-SHA-512&use-ssl=true&verify-ssl=true",
               "push_endpoint_topic": "private.rgw",
               "stored_secret": true,
               "persistent": true
           },
           "arn": "arn:aws:sns:ceph-objectstore::private.rgw",
           "opaqueData": ""
       }
   ]
}

$ aws s3api get-bucket-notification-configuration --bucket=XXX
{
   "TopicConfigurations": [
       {
           "Id": “my-id",
           "TopicArn": "arn:aws:sns:ceph-objectstore::private.rgw",
           "Events": [
               "s3:ObjectCreated:*",
               "s3:ObjectRemoved:*"
           ]
       }
   ]
}


Thank you for any input to solve this!


Cheers,
Florian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux