On Thu, 25 Sep 2008 19:09:09 +0300 Alexander Nezhinsky <nezhinsky@xxxxxxxxx> wrote: > This patch introduces a few interconnected improvements, that > ultimately lead to significant reduction in interrupts rate > for iser/ib, while adding flexibility to tgtd event processing scheme. > > First, it implements a kind of "scheduler" list, to replace the > counter events. Event handlers can schedule other events that are > placed on the scheduler's loop. This has some small advantages in > itself, because the same event descriptor is used for all events > in the system, events are executed in order they'd been scheduled, > they can be removed from list, few instances of the same event > may be scheduled. > > But the most important change is the logic of the processing loop, > as a whole. It goes as follows: > > The scheduler checks events on the list, does their processing, > which can schedule more items, but does not process the new items > (in order to avoid infinite/long loops). > > If after processing all "old" events, some "new" ones were scheduled, > epoll_wait() is executed in non-blocking manner. This guarantees > that the file descriptors that became ready during the scheduler > processing are handled, but if there no ready fds, the remaining > scheduler events are processed immediately. > > But if after processing the scheduler list nothing new is added, > then epoll_wait is blocking as in the current scheme. > > This way we can be very flexible, because event handlers and deferred > work can not block one another. Potentially long handlers can be > split into shorter ones easily without blocking the entire target. > > Finally, IB completion queue is the first guy who benefits, because > we can postpone interrupt re-arming, until no completion entries > remain, and to schedule CQ "draining" after all events are serviced. > > When a completion event is handled, CQ is polled so that up to a > given number (now set to 8) of WCs are processed. > > If more completions are left on the CQ, essentially the same > handler is scheduled, to carry out the next round of polling, > but in the meanwhile, other events in the system can be serviced. > > If CQ is found empty, interrupts are re-armed and a handler is > scheduled to consume the completions which could sneak in > between the moment the CQ was seen empty and just before > the interrupts were re-armed. > > Thus in iSER IB there is a marked interrupts rate reduction. > Here are typical results: > Current target: 62-65 KIOPS, ~110,000 interrupt/sec, CPU% ~68% > Patched target: 65-70 KIOPS, ~65,000 interrupt/sec, CPU% ~65% > > Signed-off-by: Alexander Nezhinsky <nezhinsky@xxxxxxxxx> Thanks, Seems that you fixed most of the style issues but there are still some: ERROR: trailing whitespace #294: FILE: usr/iscsi/iscsi_rdma.c:1058: +/* Scheduled to poll cq after a completion event has been $ ERROR: trailing whitespace #304: FILE: usr/iscsi/iscsi_rdma.c:1068: + after the cq had been seen empty but just before $ ERROR: trailing whitespace #305: FILE: usr/iscsi/iscsi_rdma.c:1069: + the notification interrupts were re-armed. $ ERROR: trailing whitespace #338: FILE: usr/iscsi/iscsi_rdma.c:1102: +^I/* if a poll was previosuly scheduled, remove it, $ total: 4 errors, 0 warnings, 565 lines checked > diff --git a/usr/tgtd.c b/usr/tgtd.c > index 0b1cb4c..ac3ff76 100644 > --- a/usr/tgtd.c > +++ b/usr/tgtd.c > @@ -38,26 +38,13 @@ > #include "work.h" > #include "util.h" > > -struct tgt_event { > - union { > - event_handler_t *handler; > - counter_event_handler_t *counter_handler; > - }; > - union { > - int fd; > - int *counter; > - }; > - void *data; > - struct list_head e_list; > -}; exporting struct tgt_event fine for now but I don't fancy it much. If we have proper APIs for it, we could avoid it, I guess. But I've not see rdma code so I'm not sure. > -static void event_loop(void) > +void tgt_init_sched_event(struct tgt_event *evt, > + sched_event_handler_t sched_handler, void *data) > { > - int nevent, i, done, timeout = TGTD_TICK_PERIOD * 1000; > - struct epoll_event events[1024]; > + evt->sched_handler = sched_handler; > + evt->scheduled = 0; > + evt->data = data; > + INIT_LIST_HEAD(&evt->e_list); > +} > + > +void tgt_add_sched_event(struct tgt_event *evt) > +{ > + if (!evt->scheduled) { > + evt->scheduled = 1; > + list_add_tail(&evt->e_list, &tgt_sched_events_list); > + } > +} > + > +void tgt_remove_sched_event(struct tgt_event *evt) > +{ > + if (evt->scheduled) { > + evt->scheduled = 0; > + list_del_init(&evt->e_list); > + } > +} > + > +static void tgt_exec_scheduled(void) > +{ > + struct list_head *last_sched; > struct tgt_event *tev, *tevn; > > -retry: > - /* > - * Check the counter events to see if they have any work to run. > - */ > - do { > - done = 1; > - list_for_each_entry_safe(tev, tevn, &tgt_counter_events_list, > - e_list) { > - if (*tev->counter) { > - done = 0; > - tev->counter_handler(tev->counter, tev->data); > - } > - } > - } while (!done); > + if (list_empty(&tgt_sched_events_list)) > + return; > + > + /* execute only work scheduled till now */ > + last_sched = tgt_sched_events_list.prev; > + list_for_each_entry_safe(tev, tevn, &tgt_sched_events_list, e_list) { > + tgt_remove_sched_event(tev); > + tev->sched_handler(tev); > + if (&tev->e_list == last_sched) > + break; > + } > +} > + > +static void tgt_poll_events(int timeout) > +{ > + int nevent, i; > + struct tgt_event *tev; > + struct epoll_event events[1024]; > > nevent = epoll_wait(ep_fd, events, ARRAY_SIZE(events), timeout); > if (nevent < 0) { > @@ -255,9 +229,19 @@ retry: > } > } else > schedule(); > +} > > - if (system_active) > - goto retry; > +static void event_loop(void) > +{ > + int timeout, wait_timeout = TGTD_TICK_PERIOD * 1000; > + > + while (system_active) { > + tgt_exec_scheduled(); > + /* wait if no scheduled work, poll if there is */ > + timeout = list_empty(&tgt_sched_events_list) ? > + wait_timeout : 0; > + tgt_poll_events(timeout); > + } Can you avoid rewriting (or cleaning up) even_loop()? That's not related with this patch. You want to replace the first half (counter_events code) in the original even_loop(). I don't think that you need to rewrite event_loop in the above way, right? If you do something like the following, your patch is more smaller, I think. static void event_loop(void) { int nevent, i, done, timeout = TGTD_TICK_PERIOD * 1000; struct epoll_event events[1024]; struct tgt_event *tev, *tevn; retry: done = schedule_event(); /* choose whatever name you like */ if (!done) timeout = 0; nevent = epoll_wait(ep_fd, events, ARRAY_SIZE(events), timeout); if (nevent < 0) { if (errno != EINTR) { eprintf("%m\n"); exit(1); } } else if (nevent) { for (i = 0; i < nevent; i++) { tev = (struct tgt_event *) events[i].data.ptr; tev->handler(tev->fd, events[i].events, tev->data); } } else schedule(); if (system_active) goto retry; } -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html