Chris Wright wrote: > * Gregory Haskins (ghaskins@xxxxxxxxxx) wrote: > >> Note that the design of vbus should prevent any weakening >> > > Could you elaborate? > Absolutely. So you said that something in the kernel could weaken the protection/isolation. And I fully agree that whatever we do here has to be done carefully...more carefully than a userspace derived counterpart, naturally. So to address this, I put in various mechanisms to (hopefully? :) ensure we can still maintain proper isolation, as well as protect the host, other guests, and other applications from corruption. Here are some of the highlights: *) As I mentioned, a "vbus" is a form of a kernel-resource-container. It is designed so that the view of a vbus is a unique namespace of device-ids. Each bus has its own individual namespace that consist solely of the devices that have been placed on that bus. The only way to create a bus, and/or create a device on a bus, is via the administrative interface on the host. *) A task can only associate with, at most, one vbus at a time. This means that a task can only see the device-id namespace of the devices on its associated bus and thats it. This is enforced by the host kernel by placing a reference to the associated vbus on the task-struct itself. Again, the only way to modify this association is via a host based administrative operation. Note that multiple tasks can associate to the same vbus, which would commonly be used by all threads in an app, or all vcpus in a guest, etc. *) the asynchronous nature of the shm/ring interfaces implies we have the potential for asynchronous faults. E.g. "crap" in the ring might not be discovered at the EIP of the guest vcpu when it actually inserts the crap, but rather later when the host side tries to update the ring. A naive implementation would have the host do a BUG_ON() when it discovers the discrepancy (note that I still have a few of these to fix in the venet-tap code). Instead, what should happen is that we utilize an asynchronous fault mechanism that allows the guest to always be the one punished (via something like a machine-check for guests, or SIGABRT for userspace, etc) *) "south-to-north path signaling robustness". Because vbus supports a variety of different environments, I call guest/userspace "north', and the host/kernel "south". When the north wants to communicate with the kernel, its perfectly ok to stall the north indefinitely if the south is not ready. However, it is not really ok to stall the south when communicating with the north because this is an attack vector. E.g. a malicous/broken guest could just stop servicing its ring to cause threads in the host to jam up. This is bad. :) So what we do is we design all south-to-north signaling paths to be robust against stalling. What they do instead is manage backpressure a little bit more intelligently than simply blocking like they might in the guest. For instance, in venet-tap, a "transmit" from netif that has to be injected in the south-to-north ring when it is full will result in a netif_stop_queue(). etc. I cant think of more examples right now, but I will update this list if/when I come up with more. I hope that satisfactorily answered your question, though! Regards, -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature