On Mon, 2021-12-20 at 22:11 +0100, Borislav Petkov wrote: > Bah, that thread is not on lkml. Please always Cc lkml in the future. > > On Mon, Dec 20, 2021 at 12:39:19PM -0800, Kristen Carlson Accardi > wrote: > > If a malicious or just extra large enclave is loaded, or even just > > a > > lot of enclaves, it can eat up all the normal RAM on the system. > > Normal > > methods of finding out where all the memory on the system is being > > used, wouldn't be able to find this usage since it is shared > > memory. In > > addition, the OOM killer wouldn't be able to kill any enclaves. > > So you need some sort of limiting against malicious enclaves anyways, > regardless of this knob. IOW, you can set a percentage limit of > per-enclave memory which cannot exceed a certain amount which won't > prevent the system from its normal operation. For example. > > > I completely agree - so I'm trying to make sure I understand this > > comment, as the value is currently set to default that would > > automatically apply that is based on EPC memory present and not a > > fixed > > value. So I'd like to understand what you'd like to see done > > differently. are you saying you'd like to eliminated the ability to > > override the automatic default? Or just that you'd rather calculate > > the > > percentage based on total normal system RAM rather than EPC memory? > > So you say that there are cases where swapping to normal RAM can eat > up all RAM. > > So the first heuristic should be: do not allow for *all* enclaves > together to use up more than X percent of normal RAM during EPC > reclaim. So, in your proposal, you would first change the calculated number of maximum available backing pages to be based on total system RAM rather than EPC memory, got it. > > X percent being, say, 90% of all RAM. For example. I guess 10% should > be enough for normal operation but someone who's more knowledgeable > in > system limits could chime in here. > > Then, define a per-enclave limit which says, an enclave can use Y % > of > memory for swapping when trying to reclaim EPC memory. And that can > be > something like: > > 90 % RAM > -------- > total amount of enclaves currently on the system > This would require recalculating the max number of allowed backing pages per enclave at run time whenever a new enclave is loaded - but all the existing enclaves may have already used more than the new max number of per-enclave allowable pages. How would you handle that scenario? This would add a lot of complexity for sure - and it does make me wonder whether any additional benefit of limiting per enclave would be worth it. > And you can obviously create scenarios where creating too many > enclaves > can bring the system into a situation where it doesn't do any forward > progress. > > But you probably can cause the same with overcommitting with VMs so > perhaps it would be a good idea to look how overcommitting VMs and > limits there are handled. > > Bottom line is: the logic should be for the most common cases to > function properly, out-of-the-box, without knobs. And then to keep > the > system operational by preventing enclaves from bringing it down to a > halt just by doing EPC reclaim. > > Does that make more sense? > Thanks for your more detailed explanation - I will take a look at the VM overcommit limits. Since obviously the original implementation did have a default value set, I had still a remaining specific question about your comments. Are you suggesting that there should not be a way to override any overcommit limit at all? So drop the parameter all together?