On Tue, Jun 21, 2016 at 10:16 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, Jun 21, 2016 at 9:45 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> >> So I'm leaning toward fewer cache entries per cpu, maybe just one. >> I'm all for making it a bit faster, but I think we should weigh that >> against increasing memory usage too much and thus scaring away the >> embedded folks. > > I don't think the embedded folks will be scared by a per-cpu cache, if > it's just one or two entries. And I really do think that even just > one or two entries will indeed catch a lot of the cases. > > And yes, fork+execve() is too damn expensive in page table build-up > and tear-down. I'm not sure why bash doesn't do vfork+exec for when it > has to wait for the process anyway, but it doesn't seem to do that. > I don't know about bash, but glibc very recently fixed a long-standing but in posix_spawn and started using clone() in a sensible manner for this. FWIW, it may be a while before this can be enabled in distro kernels. There are some code paths (*cough* crypto users *cough*) that think that calling sg_init_one with a stack address is a reasonable thing to do, and it doesn't work with a vmalloced stack. grsecurity works around this by using a real lowmem higher-order stack, aliasing it into vmalloc space, and arranging for virt_to_phys to backtrack the alias, but eww. I think I'd rather find and fix the bugs, assuming they're straightforward. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html