Well, the need for reservation is general, arising from need for reliable delivery. You could substitute the fifo abstraction if 128M is too small. Matt On Thu, Jan 16, 2020 at 1:23 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: > > > On 1/16/20 12:13 PM, Yuval Lifshitz wrote: > > two updates on the design (after some discussions): > > > > (1) "best effort queue" (stretch goal) is probably not needed: > > - cls queue performance should be high enough when put on fast media pool > > - the "acl level" settings allow for existing mechanism to perform as > > "best effort" and non-blocking for topics that does not need delivery > > guarantees > > > > (2) since the cls queue does not allow for random access (without > > linear search) the retries will have to be implemented based only on > > the end of the queue. This means that we must assume that the acks or > > nack arrive in the same order in which the notifications were set. > > This is true only for a specific endpoint (e.g. a specific kafka > > broker) which means that there will have to be a separate cls queue > > instance for each endpoint > > > > > > On Tue, Jan 14, 2020 at 3:47 PM Yuval Lifshitz <ylifshit@xxxxxxxxxx > > <mailto:ylifshit@xxxxxxxxxx>> wrote: > > > > Dear Community, > > Would like to share some design ideas around the above topic. > > Feedback is welcomed! > > > > Current State > > > > - in "pull mode" [1] we have the same guarantees as the multisite > > syncing mechanism (guarantee against HW/SW failures). On top of > > that, if writing the event to RADOS fails, this trickle back as > > sync failure, which means that the master zone will try to sync > > the pubsub zone > > > > - in "push mode" [2] we send the notification from the ops context > > that triggered the notification. The original operation is blocked > > until we get a reply from the endpoint. As part of the > > configuration for the endpoint, we also configure the "ack level", > > indicating whether we block until we get a reply from the endpoint > > or not. > > Since the operation response is not sent back to the client until > > the endpoint acks, this method guarantees against any failure in > > the radosgw (at the cost of adding latency to the operation). > > This, however, does not guarantee delivery if the endpoint is down > > or disconnected. The endpoint we interact with (rabbitmq, kafka) , > > usually have built in redundancy mechanism, but this does not > > cover the case where there is a network disconnect between our > > gateways and these systems. > > In some cases we can get a nack from the endpoint, indicating that > > our message would never reach the endpoint. But we can only log > > these cases: > > - we cannot fail the operation that triggered us, because we send > > the notification only after the actual operation (e.g. "put > > object") was done (=no atomicity) > > - no retry mechanism (in theory, we can add one) > > > > Next Phase Requirements > > > > We would like to add delivery guarantee to "push mode" for > > endpoint failures. For that we would use a message queue with the > > following features: > > - rados backed, so it would survive HW/SW failures > > - blocking only on local read/writes (so it introduces smaller > > latency than over-the-wire endpoint acks) > > - has reserve/commit semantics, so we can "reserve" before the > > operation (e.g. "put object") was done, and fail it if we cannot > > reserve a slot on the queue, and commit the notification to the > > queue only after the operation was successful (and unreserve if > > the operation failed) > > > I guess this reservation piece is only a requirement because of the > choice of cls_queue, which resides in a single rados object and so > enforces a bound on the total space used. The maximum size is > configurable, but can't exceed osd_max_object_size=128M. How many > notifications could we fit within that the 128M limit? I worry that > clusters at a sufficient scale could fill that pretty quickly if the > notification endpoint is unavailable or slow, and that would leave > radosgw unable to satisfy any requests that would generate a notification. > > > - we would have a retry mechanism based on the queue, which means > > that if a notification was successfully pushed into the queue, we > > can assume it would (eventually) be successfully delivered to the > > endpoint > > > > Proposed Solution > > > > - use the cls_queue [3] (cls_queue is not omap based, hence, no > > builtin iops limitations) > > - add reserve/commit functionality (probably store that info in > > the queue head) > > - a dedicated thread(s) should be reading requests from the queue, > > sending the notifications to the endpoints, and waiting for > > the replies (if needed) - this should be done via coroutines > > - acked requests are removed from the queue, nacked or > > timed-out requests should be retried (at least for a while) > > - both mechanism would coexist, as this would be configurable per > > topic > > - as a stretch goal, we may add a "best effort queue". This would > > be similar to the cls_queue solution, but won't address > > radosgw failures (as the queue would be in-memory), only endpoint > > failures/disconnects > > - for now, this mechanism won't be supported for pushing events > > from the pubsub zone (="pull+push mode"), but might be added if > > users would find it useful > > > > Yuval > > > > [1] https://docs.ceph.com/docs/master/radosgw/pubsub-module/ > > [2] https://docs.ceph.com/docs/master/radosgw/notifications/ > > [3] https://github.com/ceph/ceph/tree/master/src/cls/queue > > > > > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx