Re: RGW pubsub deprecation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,
Thanks for the info. Comments inline:

On Wed, Oct 13, 2021 at 7:21 PM Dave Piper <david.piper@xxxxxxxxxxxxx>
wrote:

> Hi Yuval,
>
> We're using pubsub!
>
> We opted for pubsub over bucket notifications as the pull mode fits well
> with our requirements.
>
> 1) We want to be able to guarantee that our client (the external server)
> has received and processed each event. My initial understanding of bucket
> notifications was that they weren't stored on ceph at all, and were simply
> broadcast and then forgotten.


correct, this was the case in "nautilus" and "octopus"

Actually I see that the docs state the notification will be retried until
> acked [3].

Is that guaranteed?


yes, "persistent notifications" were added in "pacific". note that if the
queue for the notifications fills up, the S3 operations triggering them
would fail

Will ceph ultimately give up and drop an event?


we will try indefinitely (or until the topic is deleted)


> Is there a way of seeing how many events have been unacked / dropped?
>

for persistent notifications, we currently only have a global counter (for
all topics) indicating the notifications that were successfully sent:
"pubsub_push_ok". this is something we should probably improve on. filed:
https://tracker.ceph.com/issues/52927
feel free to add your thoughts there


> 2) Being able to pull a list of missed events back, rather than receiving
> them one at a time, allows our client to cut down on processing. As an
> example, if the same object is updated 10 times, pubsub catchup list will
> list 10 events for the same object, and the client can recognise this and
> only needs to process the object once and ack all 10 events.  The bucket
> notification model suggests we will have to process each event in turn.
> There are possibly ways we can work around this though (e.g. queue incoming
> bucket notifications on the client and process them in batches).
>
>
our intent is not to deprecate the "pull" functionality. instead, we want
to replace the pubsub sync module with a notification queue that external
applications would be able to pull from. the overall idea was presented in
FOSDEM21:
https://archive.fosdem.org/2021/schedule/event/sds_ceph_rgw_serverless/
note that this effort is in the early stages, so it is hard to forecast a
time when this would be ready


> We've had a number of issues with pubsub and still aren't confident in its
> behaviour. Your post suggests its not well used, which might imply it has
> less field hardening that bucket notifications.


correct. there are inherent problems with utilizing the multisite synching
mechanism for bucket notifications:
* some information on the original transactions is lost (since it is not
needed for syncing) and cannot be sent in the notification payload
* as you observed, there are duplicates... when syncing objects, duplicates
are not really an issue, as the end result is the same. but for
notifications they create a problem
* clients don't scale easily: unless you build a complex mechanism for the
client, there could be only one client processing the notifications
* setup is more complex for pubsub: it requires a separate zone;
non-standard tools (bucket notifications work with boto3 etc.)


> If so, it sounds like it might be better for us both if we switched to
> using the bucket notifications method instead. It'd be good to get your
> thoughts on how we could satisfy two requirements above.
>

until we deliver our own notification queue solution. would recommend using
an external one (kafka or rabbitmnq).
these solutions are reliable and persistent.
* you can have explicit "commits" in kafka to preserve the "backing"
semantics that you currently have
* amqp also has explicit consumer "acks" that serves the same purpose


>
> If pubsub is likely to be deprecated, we'll need to start moving fast.
> What's the latest thinking on this?
>
>
we are not going to deprecate that until we have an alternative solution in
ceph.


> Cheers,
>
> Dave
>
> [3]
> https://docs.ceph.com/en/latest/radosgw/notifications/#notification-reliability
>
> -
>
>
>
> -----Original Message-----
> From: Yuval Lifshitz <ylifshit@xxxxxxxxxx>
> Sent: 05 November 2020 06:57
> To: ceph-users <ceph-users@xxxxxxx>
> Subject:  RGW pubsub deprecation
>
> NOTE: Message is from an external sender
>
> Dear Community,
> Since Nautilus, we have 2 mechanisms for notifying 3rd parties on changes
> in buckets and objects: "bucket notifications" [1] and "pubsub" [2].
>
> In "bucket notifications" (="push mode") the events are sent from the RGW
> to an external entity (kafka, rabbitmq etc.), while in "pubsub" (="pull
> mode") the events are synched with a special zone, where they are stored
> and could be later fetched by an external app.
>
> From communications that I've seen so far, users preferred to use "bucket
> notifications" over "pubsub". Since supporting both modes has maintenance
> overhead, I was considering deprecating "pubsub".
> However, before doing that I would like to see what the community has to
> say!
>
> So, if you are currently using pubsub, or plan to use it, as "pull mode"
> fits your usecase better than "push mode" please chime in.
>
> Yuval
>
> [1]
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F&amp;data=04%7C01%7Cdavid.piper%40metaswitch.com%7C01afaedcb924464d82a408d8815821e0%7C9d9e56ebf6134ddbb27bbfcdf14b2cdb%7C1%7C0%7C637401562833936026%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mSP8%2FrcB%2FRLPvpxD099BMzGiQzmwlitRpACN%2F85zyxc%3D&amp;reserved=0
> [2]
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fpubsub-module%2F&amp;data=04%7C01%7Cdavid.piper%40metaswitch.com%7C01afaedcb924464d82a408d8815821e0%7C9d9e56ebf6134ddbb27bbfcdf14b2cdb%7C1%7C0%7C637401562833936026%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F4TYjlde8cekotkGKsTxl4dUroURq73CrcZsbdTuA7g%3D&amp;reserved=0
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux