On Fri, Jul 04, 2014 at 09:06:41AM +0100, Paul Burton wrote: > Yes, I think it would. The reason I went with the per-mm approach though > was to try to avoid so much overhead. I suppose we could possibly > allocate the page on demand so that threads which don't use FP don't pay > for it, and maybe use the shrinker interface to free the page if we run > low on memory and aren't currently executing from it. Though it would > mean that the FP branch delay "emulation" could fail if memory is tight, > but I suppose that's no worse than now where it could blow the (user) > stack. > > I'll try to get a v3 out at some point soon. The actual piece of code that needs to be installed is tiny. So the page could be shared between many threads. In fact a single page would suffice for most processes and only threads would require more slots than provided by a single page so more pags could be allocated or the process could sleep until a slot becomes available. Assuming the smallest supported page size of 4k and slots of 128 bytes (that is the largest S-cache line size in common use) that's 32 slots. I'm also wondering how insane emulation would be. We already have the capability to emulate a fair fraction of the instruction set. Ralf