On Sun, 2010-03-07 at 08:03 +1100, Benjamin Herrenschmidt wrote: > On Sat, 2010-03-06 at 16:17 +0530, James Bottomley wrote: > > On a fault in of exec data, we first try to get the page out of the page > > cache. If it's not present, we put the faulting process to sleep and > > fetch it in from storage. When we do the read, on the PIO path, the > > kernel alias for the page becomes dirty. Some time later, we place the > > page into the user space (updating the pte entry that caused a fault). > > At this point, we'll call both flush_icache_page() and > > update_mmu_cache() ... this is where the I/D resolution should be done. > > Since it's after any I/O has occurred, it doesn't matter whether the CPU > > speculatively moved anything in or not. As long as you flush the kernel > > alias and invalidate the user I and D aliases, we're good to go. Using > > the page arch flags is really only to optimise this process (defer > > kernel D alias flushing). > > Ok, so while flush_icache_page() looks like something we could use > instead of set_pte_at() for the icache flushing, it doesn't answer all > the questions. Off the top of my mind: OK, so what I was actually trying to get across is the point that we don't handle I cache problems in the I/O or page cache code ... we handle them in the mm code, so the mm piece of the above was deliberately a bit vague. > - I see the calls to flush_icache_page() in mm/memory.c but I don't see > them next to all set_pte_at() that insert a valid PTE. For example, we > don't flush the icache for anonymous pages. While that might seem like a > good idea, we have been under pressure to "fix" that on powerpc to make > sure there is no stale icache content from another process leaking into > userspace. I'm not entirely sure what flush_icache_page() is supposed to do. On parisc it flushes the *kernel* icache ... which has got to be wrong. According to cachetlb.txt it's an obsolete interface. > - It needs to be done -before- set_pte_at() but I think the code does it > right, only your explanation above makes it unclear :-) Sorry, like I said, I only sketched the mm piece. However, at least on parisc, there's a technical problem with flushing before we have the pte: On VIPT systems, we need a mapping before the flush will work. I was experimenting with a mechanism whereby we set aside in the kernel an aligned region of our congruence size and simply flushed in that region with the correct mappings, but we haven't got around to implementing it in the kernel yet. > - It doesn't take the PTE pointer as an argument, so here goes our trick > on powerpc of filtering out exec permission rather than flushing when a > page is accessed by a read fault > > - We -still- have the problem of tracking whether the icache has been > flushed or not yet for a given physical page on archs with PIPT (or non > aliasing VIPT) like powerpc. Without that tracking, we flush a lot more > than necessary since we'll end up flushing things like glibc text pages > for every process they are mapped into which is totally wasteful. Thus > the idea of using a new PG bit to separate D$ from I$ tracking still > makes sense. So, assuming full congruence of user space, can't you use the VMA as an indicator? i.e. if we have no user space mappings, we have to flush the icache ... if we have one or more, the icache has been flushed and placing the same page congruently in a different address space benefits from that prior flush, so consequently there's no need to flush again? I also think we've established the relevant facts for the I/O thread (that we only need to either flush the kernel D cache or mark it as to be flushed later on PIO reads). We're now into deep technicalities of how the mm system operates at the architecture level, so perhaps we should move this to linux-arch? James -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html