Hey Michael!
On 27.01.21 13:47, Michael S. Tsirkin wrote:
On Thu, Jan 21, 2021 at 10:28:16AM +0000, Catangiu, Adrian Costin wrote:
On 12/01/2021, 14:49, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
On Tue, Jan 12, 2021 at 02:15:58PM +0200, Adrian Catangiu wrote:
> The first patch in the set implements a device driver which exposes a
> read-only device /dev/sysgenid to userspace, which contains a
> monotonically increasing u32 generation counter. Libraries and
> applications are expected to open() the device, and then call read()
> which blocks until the SysGenId changes. Following an update, read()
> calls no longer block until the application acknowledges the new
> SysGenId by write()ing it back to the device. Non-blocking read() calls
> return EAGAIN when there is no new SysGenId available. Alternatively,
> libraries can mmap() the device to get a single shared page which
> contains the latest SysGenId at offset 0.
Looking at some specifications, the gen ID might actually be located
at an arbitrary address. How about instead of hard-coding the offset,
we expose it e.g. in sysfs?
The functionality is split between SysGenID which exposes an internal u32
counter to userspace, and an (optional) VmGenID backend which drives
SysGenID generation changes based on hw vmgenid updates.
The hw UUID you're referring to (vmgenid) is not mmap-ed to userspace or
otherwise exposed to userspace. It is only used internally by the vmgenid
driver to find out about VM generation changes and drive the more generic
SysGenID.
The SysGenID u32 monotonic increasing counter is the one that is mmaped to
userspace, but it is a software counter. I don't see any value in using a dynamic
offset in the mmaped page. Offset 0 is fast and easy and most importantly it is
static so no need to dynamically calculate or find it at runtime.
Well you are burning a whole page on it, using an offset the page
can be shared with other functionality.
Currently, the SysGenID lives is one page owned by Linux that we share
out to multiple user space clients. So yes, we burn a single page of the
system here.
If we put more data in that same page, what data would you put there?
Random other bits from other subsystems? At that point, we'd be
reinventing vdso all over again, no? Probably with the same problems.
Which gets me to the second alternative: Reuse VDSO. The problem there
is that the VDSO is an extremely architecture specific mechanism. Any
new architecture we'd want to support would need multiple layers of
changes in multiple layers of both kernel and libc. I'd like to avoid
that if we can :).
So that leaves us with either wasting a page per system or not having an
mmap() interface in the first place.
The reason we have the mmap() interface is that it's be easier to
consume for libraries, that are not hooked into the main event loop.
So, uh, what are you suggesting? :)
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879