On Thu, May 28, 2020 at 03:01:36PM +0200, Alexander Graf wrote: > > > On 27.05.20 00:24, Greg KH wrote: > > > > On Tue, May 26, 2020 at 03:44:30PM +0200, Alexander Graf wrote: > > > > > > > > > On 26.05.20 15:17, Greg KH wrote: > > > > > > > > On Tue, May 26, 2020 at 02:44:18PM +0200, Alexander Graf wrote: > > > > > > > > > > > > > > > On 26.05.20 14:33, Greg KH wrote: > > > > > > > > > > > > On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote: > > > > > > > > > > > > > > > > > > > > > On 26.05.20 08:51, Greg KH wrote: > > > > > > > > > > > > > > > > On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote: > > > > > > > > > +#define NE "nitro_enclaves: " > > > > > > > > > > > > > > > > Again, no need for this. > > > > > > > > > > > > > > > > > +#define NE_DEV_NAME "nitro_enclaves" > > > > > > > > > > > > > > > > KBUILD_MODNAME? > > > > > > > > > > > > > > > > > +#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL) > > > > > > > > > + > > > > > > > > > +static char *ne_cpus; > > > > > > > > > +module_param(ne_cpus, charp, 0644); > > > > > > > > > +MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves"); > > > > > > > > > > > > > > > > Again, please do not do this. > > > > > > > > > > > > > > I actually asked her to put this one in specifically. > > > > > > > > > > > > > > The concept of this parameter is very similar to isolcpus= and maxcpus= in > > > > > > > that it takes CPUs away from Linux and instead donates them to the > > > > > > > underlying hypervisor, so that it can spawn enclaves using them. > > > > > > > > > > > > > > From an admin's point of view, this is a setting I would like to keep > > > > > > > persisted across reboots. How would this work with sysfs? > > > > > > > > > > > > How about just as the "initial" ioctl command to set things up? Don't > > > > > > grab any cpu pools until asked to. Otherwise, what happens when you > > > > > > load this module on a system that can't support it? > > > > > > > > > > That would give any user with access to the enclave device the ability to > > > > > remove CPUs from the system. That's clearly a CAP_ADMIN task in my book. > > > > > > > > Ok, what's wrong with that? > > > > > > Would you want random users to get the ability to hot unplug CPUs from your > > > system? At unlimited quantity? I don't :). > > > > A random user, no, but one with admin rights, why not? They can do that > > already today on your system, this isn't new. > > > > > > > Hence this whole split: The admin defines the CPU Pool, users can safely > > > > > consume this pool to spawn enclaves from it. > > > > > > > > But having the admin define that at module load / boot time, is a major > > > > pain. What tools do they have that allow them to do that easily? > > > > > > The normal toolbox: editing /etc/default/grub, adding an /etc/modprobe.d/ > > > file. > > > > Editing grub files is horrid, come on... > > It's not editing grub files, it's editing template config files that then > are used as input for grub config file generation :). > > > > When but at module load / boot time would you define it? I really don't want > > > to have a device node that in theory "the world" can use which then allows > > > any user on the system to hot unplug every CPU but 0 from my system. > > > > But you have that already when the PCI device is found, right? What is > > the initial interface to the driver? What's wrong with using that? > > > > Or am I really missing something as to how this all fits together with > > the different pieces? Seeing the patches as-is doesn't really provide a > > good overview, sorry. > > Ok, let me walk you through the core donation process. > > Imagine you have a parent VM with 8 cores. Every one of those virtual cores > is 1:1 mapped to a physical core. > > You enumerate the PCI device, you start working with it. None of that > changes your topology. > > You now create an enclave spanning 2 cores. The hypervisor will remove the > 1:1 map for those 2 cores and instead mark them as "free floating" on the > remaining 6 cores. It then uses the 2 freed up cores and creates a 1:1 map > for the enclave's 2 cores > > To ensure that we still see a realistic mapping of core topology, we need to > remove those 2 cores from the parent VM's scope of execution. The way this > is done today is by going through CPU offlining. > > The first and obvious option would be to offline all respective CPUs when an > enclave gets created. But unprivileged users should be able to spawn > enclaves. So how do I ensure that my unprivileged user doesn't just offline > all of my parent VM's CPUs? > > The option implemented here is that we fold this into a two-stage approach. > The admin reserves a "pool" of cores for enclaves to use. Unprivileged users > can then consume cores from that pool, but not more than those. > > That way, unprivileged users have no influence over whether a core is > enabled or not. They can only consume the pool of cores that was dedicated > for enclave use. > > It also has the big advantage that you don't have dynamically changing CPU > topology in your system. Long living processes that adjust their environment > to the topology can still do so, without cores getting pulled out under > their feet. Ok, that makes more sense, but: > > > > > So I really don't think an ioctl would be a great user experience. Same for > > > > > a sysfs file - although that's probably slightly better than the ioctl. > > > > > > > > You already are using ioctls to control this thing, right? What's wrong > > > > with "one more"? :) > > > > > > So what we *could* do is add an ioctl to set the pool size which then does a > > > CAP_ADMIN check. That however means you now are in priority hell: > > > > > > A user that wants to spawn an enclave as part of an nginx service would need > > > to create another service to set the pool size and indicate the dependency > > > in systemd control files. > > > > > > Is that really better than a module parameter? > > > > module parameters are hard to change, and manage control over who really > > is changing them. > > What is hard about > > $ echo 1-5 > /sys/module/nitro_enclaves/parameters/ne_cpus So at runtime, after all is booted and up and going, you just ripped cores out from under someone's feet? :) And the code really handles writing to that value while the module is already loaded and up and running? At a quick glance, it didn't seem like it would handle that very well as it only is checked at ne_init() time. Or am I missing something? Anyway, yes, if you can dynamically do this at runtime, that's great, but it feels ackward to me to rely on one configuration thing as a module parameter, and everything else through the ioctl interface. Unification would seem to be a good thing, right? thanks, greg k-h