Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 17, 2015 at 11:55 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> I doubt I'll succeed, too.  But I don't want anything resembling full
> per-cpu page tables -- per-cpu pgds would be plenty.  Still kinda
> nasty to implement.

Per-cpu pgd's would have been trivial in the old 32-bit PAE
environment. There's only four entries at the top level, and they have
to be allocated at process startup anyway - and we wopuldn't even have
to do a per-cpu-and-VM allocation, we'd just have done one single
per-cpu entry, and when switching tasks we'd *copy* the VM entries to
the per-cpu one and re-load %cr3 with the same address. I thought
about it.

But I'm really happy we never went down that road. It's non-portable,
even on x86-32 (because it requires PAE). And even there it would be
limited to "the top 1GB of virtual address space ends up being
per-cpu", and then you have to get the vmalloc space right etc, so you
have that one PGE entry for the kernel mapping that you can make be
percpu and play tricks in. So you'd basically allocate one page per
CPU for the magic upper PGD entry that maps the top 1GB, and edit that
on-the-fly as you do task-switching. Very specialized, and the upside
was very dubious.

And that "simple" trick is not really doable with the x86-64 model any
more (you can't copy 4kB efficiently the way you could copy 32 _bytes_
efficiently). And you really don't want to pre-allocate the whole
top-level PGD either. So all the things that made it "easy" for 32-bit
PAE basically went away with x86-64.

No, I think the only thing that would make it possible is if there is
some architecture extension that replaces part of the page table
mappings with a percpu MSR describing a magic mapping or two. It would
be trivial to do such an addition in hardware (it's not even in the
critical path, it would be just a new magic special case for the TLB
fill code), but without hardware support it's just not a good idea.

(And I'm not claiming that the hw extension for per-cpu mappigns would
be a good idea either, although I think it would be an _interesting_
toy to play with ;)

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux