Re: [RFC PATCH 00/17] virtual-bus

Gregory Haskins <ghaskins@xxxxxxxxxx> · Wed, 01 Apr 2009 18:10:17 -0400

Chris Wright wrote:
> * Gregory Haskins (ghaskins@xxxxxxxxxx) wrote:
>   
>> Note that the design of vbus should prevent any weakening
>>     
>
> Could you elaborate?
>   

Absolutely.

So you said that something in the kernel could weaken the
protection/isolation.  And I fully agree that whatever we do here has to
be done carefully...more carefully than a userspace derived counterpart,
naturally.

So to address this, I put in various mechanisms to (hopefully? :) ensure
we can still maintain proper isolation, as well as protect the host,
other guests, and other applications from corruption.  Here are some of
the highlights:

*) As I mentioned, a "vbus" is a form of a kernel-resource-container. 
It is designed so that the view of a vbus is a unique namespace of
device-ids.  Each bus has its own individual namespace that consist
solely of the devices that have been placed on that bus.  The only way
to create a bus, and/or create a device on a bus, is via the
administrative interface on the host.

*) A task can only associate with, at most, one vbus at a time.  This
means that a task can only see the device-id namespace of the devices on
its associated bus and thats it.  This is enforced by the host kernel by
placing a reference to the associated vbus on the task-struct itself. 
Again, the only way to modify this association is via a host based
administrative operation.  Note that multiple tasks can associate to the
same vbus, which would commonly be used by all threads in an app, or all
vcpus in a guest, etc.

*) the asynchronous nature of the shm/ring interfaces implies we have
the potential for asynchronous faults.  E.g. "crap" in the ring might
not be discovered at the EIP of the guest vcpu when it actually inserts
the crap, but rather later when the host side tries to update the ring. 
A naive implementation would have the host do a BUG_ON() when it
discovers the discrepancy (note that I still have a few of these to fix
in the venet-tap code).  Instead, what should happen is that we utilize
an asynchronous fault mechanism that allows the guest to always be the
one punished (via something like a machine-check for guests, or SIGABRT
for userspace, etc)

*) "south-to-north path signaling robustness".  Because vbus supports a
variety of different environments, I call guest/userspace "north', and
the host/kernel "south".  When the north wants to communicate with the
kernel, its perfectly ok to stall the north indefinitely if the south is
not ready.  However, it is not really ok to stall the south when
communicating with the north because this is an attack vector.  E.g. a
malicous/broken guest could just stop servicing its ring to cause
threads in the host to jam up.  This is bad. :)  So what we do is we
design all south-to-north signaling paths to be robust against
stalling.  What they do instead is manage backpressure a little bit more
intelligently than simply blocking like they might in the guest.  For
instance, in venet-tap, a "transmit" from netif that has to be injected
in the south-to-north ring when it is full will result in a
netif_stop_queue().   etc.

I cant think of more examples right now, but I will update this list
if/when I come up with more.  I hope that satisfactorily answered your
question, though!

Regards,
-Greg

Attachment:
signature.asc

Description: OpenPGP digital signature