Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Jun 25, 2015 04:48, "Ingo Molnar" <mingo@xxxxxxxxxx> wrote:
>
>  - 1x, 2x, 3x, 4x means up to 4 adjacent 4K vmalloc()-ed pages are accessed, the
>    first byte in each

So that test is a bit unfair. From previous timing of Intel TLB fills, I can tell you that Intel is particularly good at doing adjacent entries.

That's independent of the fact that page tables have very good locality (if they are the radix tree type - the hashed page tables that ppc uses are shit). So when filling adjacent entries, you take the cache misses for the page tables only once, but even aside from that, Intel send to do particularly well at the "next page" TLB fill case

Now, I think that's a reasonably common case, and I'm not saying that it's unfair to compare for that reason, but it does highlight the good case for TLB walking.

So I would suggest you highlight the bad case too: use invlpg to invalidate *one* TLB entry, and then walk four non-adjacent entries. And compare *that* to the full TLB flush.

Now, I happen to still believe in the full flush, but let's not pick benchmarks that might not show the advantages of the finer granularity.

        Linus


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]