On Tue 09-10-18 10:34:55, Dan Williams wrote: > On Tue, Oct 9, 2018 at 4:28 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > On Thu 04-10-18 09:44:35, Dan Williams wrote: > > > Hi Michal, > > > > > > On Thu, Oct 4, 2018 at 12:53 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > On Wed 03-10-18 19:15:18, Dan Williams wrote: > > > > > Changes since v1: > > > > > * Add support for shuffling hot-added memory (Andrew) > > > > > * Update cover letter and commit message to clarify the performance impact > > > > > and relevance to future platforms > > > > > > > > I believe this hasn't addressed my questions in > > > > http://lkml.kernel.org/r/20181002143015.GX18290@xxxxxxxxxxxxxx. Namely > > > > " > > > > It is the more general idea that I am not really sure about. First of > > > > all. Does it make _any_ sense to randomize 4MB blocks by default? Why > > > > cannot we simply have it disabled? > > > > > > I'm not aware of any CVE that this would directly preclude, but that > > > said the entropy injected at 4MB boundaries raises the bar on heap > > > attacks. Environments that want more can adjust that with the boot > > > parameter. Given the potential benefits I think it would only make > > > sense to default disable it if there was a significant runtime impact, > > > from what I have seen there isn't. > > > > > > > Then and more concerning question is, > > > > does it even make sense to have this randomization applied to higher > > > > orders than 0? Attacker might fragment the memory and keep recycling the > > > > lowest order and get the predictable behavior that we have right now. > > > > > > Certainly I expect there are attacks that can operate within a 4MB > > > window, as I expect there are attacks that could operate within a 4K > > > window that would need sub-page randomization to deter. In fact I > > > believe that is the motivation for CONFIG_SLAB_FREELIST_RANDOM. > > > Combining that with page allocator randomization makes the kernel less > > > predictable. > > > > I am sorry but this hasn't explained anything (at least to me). I can > > still see a way to bypass this randomization by fragmenting the memory. > > With that possibility in place this doesn't really provide the promissed > > additional security. So either I am missing something or the per-order > > threshold is simply a wrong interface to a broken security misfeature. > > I think a similar argument can be made against > CONFIG_SLAB_FREELIST_RANDOM the randomization benefits can be defeated > with more effort, and more effort is the entire point. If there is relatively simple way to achieve that (which I dunno about the slab free list randomization because I am not familiar with the implementation) then the feature is indeed questionable. I would understand an argument about feasibility if bypassing was extremely hard but fragmenting the memory is relatively a simple task. > > > Is that enough justification for this patch on its own? > > > > I do not think so from what I have heard so far. > > I'm missing what bar you are judging the criteria for these patches, > my bar is increased protection against allocation ordering attacks as > seconded by Kees, and the memory side caching effects. As said above, if it is quite easy to bypass the randomization then calling and advertizing this as a security feature is a dubious. Not enough to ouright nak it of course but also not something I would put my stamp on. And arguments would be much more solid if they were backed by some numbers (not only for the security aspect but also the side caching effects). > That said I > don't have a known CVE in my mind that would be mitigated by 4MB page > shuffling. > > > > It's > > > debatable. Combine that though with the wider availability of > > > platforms with memory-side-cache and I think it's a reasonable default > > > behavior for the kernel to deploy. > > > > OK, this sounds a bit more interesting. I am going to speculate because > > memory-side-cache is way too generic of a term for me to imagine > > anything specific. > > No need to imagine, a memory side cache shipped on a previous product > as Robert linked in his comments. Could you make this a part of the changelog? I would really appreciate to see justification based on actual numbers rather than quite hand wavy "it helps". > > Many years back while at a university I was playing > > with page coloring as a method to reach a more stable performance > > results due to reduced cache conflicts. It was not always a performance > > gain but it definitely allowed for more stable run-to-run comparable > > results. I can imagine that a randomization might lead to a similar effect > > although I am not sure how much and it would be more interesting to hear > > about that effect. > > Cache coloring is effective up until your workload no longer fits in > that color. Yes, that was my observation back then more or less. But even when you do not fit into the cache a color aware strategy (I was playing with bin hoping as well) produced a more deterministic/stable results. But that is just a side note as it doesn't directly relate to your change. > Randomization helps to attenuate the cache conflict rate > when that happens. I can imagine that. Do we have any numbers to actually back that claim though? > For workloads that may fit in the cache, and/or > environments that need more explicit cache control we have the recent > changes to numa_emulation [1] to arrange for cache sized numa nodes. Could you point me to some more documentation. My google-fu is failing me and "5.2.27.5 Memory Side Cache Information Structure" doesn't point to anything official (except for your patch referencing it). > > If this is really the case then I would assume on/off > > knob to control the randomization without something as specific as > > order. > > Are we only debating the enabling knob at this point? I'm not opposed > to changing that, but I do think we want to keep the rest of the > infrastructure to allow for shuffling on a variable page size boundary > in case there is enhanced security benefits at smaller buddy-page > sizes. I am still trying to understand the benefit of this change. If the caching effects are actually the most important part and there is a reasonable cut in allocation order to keep the randomization effective during the runtime then I would like to understand the thinking behind that. In other words does the randomization at smaller orders than biggest order still visible in actual benchmarks? If not then on/off knob should be sufficient with potential auto tuning based on actual HW rather than to expect poor admin to google for $RANDOM_ORDER to use on a specific HW and all the potential cargo cult that will grow around it. As I've said before, I am not convinced about the security argument but even if I am wrong here then I am still quite sure that you do not want to expose the security aspect as "chose an order to randomize from" because admins will have no real way to know what is the $RANDOM_ORDER to set. So even then it should be on/off thing. You are going to pay some of the performance because you would lose some page allocator optimizations (e.g. pcp lists) but that is unavoidable AFAICS. With all that being said, I think the overal idea makes sense but you should try much harder to explain _why_ we need it and back your justification by actual _data_ before I would consider my ack. > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc9aec03e58f -- Michal Hocko SUSE Labs