On 10/01/2009 06:50 PM, Anthony Liguori wrote:
Avi Kivity wrote:
On 10/01/2009 04:23 PM, Anthony Liguori wrote:
Juan Quintela wrote:
Discused with Anthony about it. signalfd is complicated for qemu
upstream (too difficult to use properly),
It's not an issue of being difficult.
To emulate signalfd, we need to create a thread that writes to a
pipe from a signal handler. The problem is that a write() can
return a partial result and following the partial result, we can end
up getting an EAGAIN. We have no way to queue signals beyond that
point and we have no sane way to deal with partial writes.
pipe buffers are multiples of of the signalfd size. As long as we
read and write signalfd-sized blocks, we won't get partial writes.
It's true that depending on an implementation detail is bad practice,
but this is emulation code, and if helps simplifying everything else,
I think it's fine to use it.
That's a pretty hairy detail to rely upon..
Well, it's a posix detail, as I quoted below. I'm not in love with it
but it should work.
Instead, how we do this in upstream QEMU is that we install a signal
handler and write one byte to the fd. If we get EAGAIN, that's fine
because all we care about is that at least one byte exists in the
fd's buffer. This requires that we use an fd-per-signal which means
we end up with a different model than signalfd.
The reason to use signalfd over what we do in upstream QEMU is that
signalfd can allow us to mask the signals which means less EINTRs.
I don't think that's a huge advantage and the inability to do
backwards compatibility in a sane way means that emulated signalfd
is not workable.
signalfd is several microseconds faster than signals + pipes. Do we
have so much performance we can throw some of it away?
Do we have any indication that this difference is actually
observable? This seems like very premature optimization.
Multiply the signal rate by "a few microseconds", if you get more than
0.1% cpu it's worthwhile in my opinion. The code is localized, and
signalfd is a better interface than signals.
The same is generally true for eventfd.
eventfd emulation will also never get partial writes.
But you cannot emulate eventfd faithfully because eventfd is supposed
to be additive. If you write 1 50x to eventfd, you should be able to
read a set of integers that add up to 50. If you hit EAGAIN in a
signal handler, you have no way of handling that.
We never rely on the count anyway. You can simply ignore EAGAIN.
As I said earlier, the better thing to do is have a higher level
interface that has a subset of the behavior of eventfd/signalfd that
we can emulate correctly.
Sure, but it's more work. Copying an existing interface is easier.
It's not like there's no other work in qemu left to be done.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html