Re: [PATCH v4 0/9] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1

Fam Zheng <famz@xxxxxxxxxx> · Fri, 13 Mar 2015 19:31:22 +0800

On Thu, 03/12 11:02, Jason Baron wrote:
> On 03/09/2015 09:49 PM, Fam Zheng wrote:
> >
> > Benchmark for epoll_pwait1
> > ==========================
> >
> > By running fio tests inside VM with both original and modified QEMU, we can
> > compare their difference in performance.
> >
> > With a small VM setup [t1], the original QEMU (ppoll based) has an 4k read
> > latency overhead around 37 us. In this setup, the main loop polls 10~20 fds.
> >
> > With a slightly larger VM instance [t2] - attached a virtio-serial device so
> > that there are 80~90 fds in the main loop - the original QEMU has a latency
> > overhead around 49 us. By adding more such devices [t3], we can see the latency
> > go even higher - 83 us with ~200 FDs.
> >
> > Now modify QEMU to use epoll_pwait1 and test again, the latency numbers are
> > repectively 36us, 37us, 47us for t1, t2 and t3.
> >
> >
> 
> Hi,
> 
> So it sounds like you are comparing original qemu code (which was using
> ppoll) vs. using epoll with these new syscalls. Curious if you have numbers
> comparing the existing epoll (with say the timerfd in your epoll set), so
> we can see the improvement relative to epoll.

I did compare them, but they are too close to see differences. The improvements
in epoll_pwait1 doesn't really help the hot path of guest IO, but it does
affect the program timer precision, that are used in various device emulations
in QEMU.

Although it's kind of subtle and difficult to summarize here, I can give an
example in the IO throttling implementation in QEMU, to show the significance:

The throttling algorithm computes a duration for the next IO, which is used to
arm a timer in order to delay the request a bit. As timers are always rounded
*UP* to the effective granularity, the timeout being 1ms in epoll_pwait is just
too coarse and will lead to severe inaccuracy. With epoll_pwait1, we can avoid
the rounding-up.

I think this idea could be pertty generally desired by other applications, too.

Regarding the epoll_ctl_batch improvement, again, it is not going to disrupt
the numbers in the small workload I managed to test.

Of course, if you have a specific application senario in mind, I will try it. :)

Thanks,
Fam
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html