On Thu, May 22, 2008 at 7:25 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: > Thanks for the reply. I would appreciate if someone can help to > clear just a few more doubts.... > Hi, no problem :-) > On Thu, May 22, 2008 at 7:31 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote: >> On 5/22/08, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: >>> d. any problem with multi-CPU, PAE scenario? >>> >> >> We will disable all but one CPU at run-time if the kernel was compiled >> with CONFIG_SMP=y. This is because there is a race between CPUs if one >> of them is modifying the page tables and the page table change "leaks" >> into other TLBs. >> > > sorry i don't understand this. > > just to confirm this: In linux kernel, there is only one kernel > pagetable, shared by all the different processes, and all the > different CPUs right? Correct. > > so current kernel is definitely able to handle concurrent modification > of the pagetable, right? (either via locks or lockless algorithm). > I mean, for example, supposed the PT has multiple locks - for > different regions of memory (either different GFP or node level) and > if one CPU is modifying the PT, then another CPU will blocked if the > same region of memory is attempted to lock, but otherwise it can just > go ahead to read/write the other region of memory - owned by a > different set of locks... I may not be right.....so in the context of > kmemcheck - how does the race arises? > Okay, so the main problem is -- we can lock before changing the page table itself, but we cannot lock the memory location before it is modified -- because it can be modified from anywhere on any cpu! So imagine this scenario: We have two tasks A and B on different CPUs. Task A accesses some memory location which is being tracked by kmemcheck. This access triggers a page fault and in the page fault handler, we lock the page (where the lock is doesn't really matter). Then we mark the PTE present. Now task B comes along and accesses the very same memory location. Since task B didn't have this page in the cache, it looks it up from RAM. Ah -- the PTE is present; the CPU can happily access this memory location, and no page fault is generated, so the lock is never even attempted to be taken. (Now task A restarts the faulting instruction, marks the PTE non-present and unlocks the page lock.) Do you see a way around this? The race window is admittedly incredible small. But it's a race :-) This is why we need to duplicate the page tables. Then one CPU can change the PTE to present without affecting any of the other CPUs in the system. If you can think of another way to do this... :-) (Note: It may not be necessary to duplicate the _whole_ page-table structure. I didn't pursue this thought yet.) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ