I made a simple draft about adding async event notification support for librbd: The initial idea is try to avoid much change to existing apis. So we could add a new api like: struct { int result; void *userdata; ...... } rbd_aio_event; int poll_io_events(ImageCtx *ictx, rbd_aio_event *events, int numevents, struct timespec *timeout); int set_image_notification(ImageCtx *ictx, void *handler, enum notification_type); It seemed a little tricky, if user call "set_image_notification" successfully, user can call aio_write/read with specified userdata(original callback argument pointer). Librbd internal thread will post async event to the "eventfd" using the specified way(notification_type) when io finished. For example, linux/bsd will use [eventfd])(http://man7.org/linux/man-pages/man2/eventfd.2.html), solaris could use [port_send](http://docs.oracle.com/cd/E23823_01/html/816-5168/port-send-3c.html#scrolltoc), windows could use iocp method [PostQueuedCompletionStatus](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365458(v=vs.85).aspx). If client call rbd without "set_image_notification", user could call "poll_io_events" will get -EOPNOTSUPP. On Wed, Jul 8, 2015 at 11:46 AM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: > On Wed, Jul 8, 2015 at 11:08 AM, Josh Durgin <jdurgin@xxxxxxxxxx> wrote: >> On 07/07/2015 08:18 AM, Haomai Wang wrote: >>> >>> Hi All, >>> >>> Currently librbd support aio_read/write with specified >>> callback(AioCompletion). It would be nice for simple caller logic, but >>> it also has some problems: >>> >>> 1. Performance bottleneck: Create/Free AioCompletion and librbd >>> internal finisher thread complete "callback" isn't a *very >>> littleweight" job, especially when "callback" need to update some >>> status with lock hold >>> >>> 2. Call logic: Usually like fio rbd engine, caller will maintain some >>> status with io and rbd callback isn't enough to finish all the jobs >>> related to io. For example, caller need to check each queued io >>> stupidly again when rbd callback finished. >>> >>> So maybe we could add new api which support eventfd, so caller could >>> add eventfd to its event loop and batch reap finished io event and >>> update status or do more things. >>> >>> Any feedback is appreciated! >> >> >> It seems like a good idea to me. I'm not sure how much overhead it >> avoids, but letting the callers check status from their own threads >> is much nicer in general. >> >> I'd be curious how much overhead the callback + finisher add. If it's >> significant, it might make sense to add similar eventfd interfaces >> lower in the stack too. > > From intuition if we do high iodepth benchmark, noncallback way could > reduce lots of "extra callback latency" because new way could batch > them. Another performance benefit I think from caller side, new way > could let complexity io finished job avoid "callback lock" and reduce > extra logic. Finally, mostly callback need to wakeup caller thread to > do next thing, it would be great that with new way we can do it in > librbd via eventfd. > >> >> Josh > > > > -- > Best Regards, > > Wheat -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html