On Wed, Feb 08, 2017 at 12:51:24PM -0800, Andy Lutomirski wrote: > On Tue, Feb 7, 2017 at 10:56 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > Quite a few people have expressed interest in enabling PCID on (x86) > > Linux. Here's the code: > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/pcid > > > > The main hold-up is that the code needs to be reviewed very carefully. > > It's quite subtle. In particular, "x86/mm: Try to preserve old TLB > > entries using PCID" ought to be looked at carefully to make sure the > > locking is right, but there are plenty of other ways this this could > > all break. > > > > Anyone want to take a look or maybe scare up some other reviewers? > > (Kees, you seemed *really* excited about getting this in.) > > Nadav pointed out that this doesn't work right with > ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH. Mel, here's the issue: > I confess I didn't read the thread or patch, I'm only looking at this question. > I want to add ASID (Intel calls it PCID) support to x86. This means > that "flush the TLB on a given CPU" will no longer be a particularly > well defined operation because it's not clear which ASID tag to flush. > Instead there's "flush the TLB for a given mm on a given CPU". > Understood. > If I'm understanding the batched flush code, all it's trying to do is > to flush more than one mm at a time. More or less. It would be more accurate to say it's flushing any CPU TLB that potentially accessed a list of pages recently. It's not flushing "multiple MMs at a time" as such, it's flushing "only CPUs that accessed pages mapped by a mm recently". The distinction may not matter. The requirements do matter though. mm/vmscan.c is where TTU_BATCH_FLUSH is set and that is processing a list of pages that can be mapped by multiple MMs. The TLBs must be flushed before either IO starts (try_to_unmap_flush_dirty) or they are freed to the page allocator (try_to_unmap_flush). To do this, all it has to track is a simple mask of CPUs, whether a flush is necessary and whether any of the PTEs were dirty. This is trivial to assemble during an rmap walk as it's a PTE check and a cpumask_or. try_to_unmap_flush then flushes the entire TLB as the cost of targetted a specific page to flush was so high (both maintaining the PFNs and the individual flush operations). > Would it make sense to add a new > arch API to flush more than one mm? Presumably it would take a linked > list, and the batched flush code would fall back to flushing in pieces > if it can't allocate a new linked list node when needed. > Conceptually it's ok but the details are a headache. The defer code would need to maintain a list of mm's (or ASIDs) that is unbounded in size to match the number of IPIs sent as the current code as opposed to a simple cpumask. There are SWAP_CLUSTER_MAX pages to consider with each page potentially mapped by an arbitrary number of MMs. The same mm's could exist on multiple lists for each active kswapd instance and direct reclaimer. As multiple reclaimers are interested in the same mm, that pretty much rules out linking them off mm_struct unless the locking would serialise the parallel reclaimers and prevent an mm existing on more than one list at a time. You could try allowing multiple tasks to share the one list (not sure how to find that list quickly) but each entry would have to be locked and as each user can flush at any time, multiple reclaimers potentially have to block while an IPI is being sent. It's hard to see how this could be scaled to match the existing code. It would be easier to track via an array stored in task_struct but the number of MMs is unknown in advance so all you can do is guess a reasonable size. It would have to flush if the array files resulting in more IPIs than the current code depending on how many MMs map the list of pages. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>