On 5/22/08, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: > On Wed, May 21, 2008 at 6:04 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote: > > In the kmemcheck code I take a lot of page faults from any kernel > > context (with interrupts enabled or disabled). This means that there > > are a lot of things I can't do. Taking locks is dangerous while > > handling a page fault occurring in interrupt context. In addition to > > this, I must _not_ access any memory allocated by kmalloc(), as this > > may generate a new (recursive) page fault. > > > > Currently, I am deferring work to be done later by using a timer that > > triggers every HZ. This allows me to do what I want in the right > > context, e.g. interrupts enabled and no locks taken. > > > > However, the timer triggers even when I don't need it, and once a > > second is usually too slow when I actually do need it. So I am looking > > for a way to schedule my deferred work as soon as interrupts are > > disabled in the context that caused a page fault. > > > > I was reading Matthew Wilcox's paper on softirqs, tasklets, bottom > > halves, task queues, work queues, and timers. But I am still a little > > unsure of the best way to proceed. My requirement of not accessing > > dynamically allocated memory seem unprecedented in the kernel. E.g., > > one of my earliest attempts included using a kernel thread and waking > > it up from the page fault handler, but this did not work because > > adding the kthread to a runqueue would access dynamically allocated > > memory. > > > I have not read the patch yet, but this concept interest me very much: > > a. If u tracked every read before it is written - how do u know if > it is written or not? Ie, for each write, u have to set a bit to > indicate that the byte of memory is written? or is it done at the > word/page level? Yep, we catch all accesses, both reads and writes. So on write, we set a bit, and on read, we check that the bit is set. (We actually have a few more states, but that's the basic idea, yeah.) The granularity of initialized/uninitialized is on the byte level. It would be too hard to do this for bit level granularity since we are not emulating the code (like valgrind does). > > b. it is only for kernel memory - right? process memory may be > swapped out, a huge performance tradeoff to make to do that. Yes, only for kernel memory allocated using kmalloc() or kmem_cache_alloc(). > > c. how about DMA memory? (hardware devices will write to > it....which will not trigger the normal pagetable mechanism, so it is > not possible capture writing to these memory?) Yep, this is entirely correct. We do have this exact problem; the solution is to annotate these memory areas by allocating them using the __GFP_NOTRACK flag. This item is discussed in the Documentation/kmemcheck.txt file of the patch. > > d. any problem with multi-CPU, PAE scenario? > We will disable all but one CPU at run-time if the kernel was compiled with CONFIG_SMP=y. This is because there is a race between CPUs if one of them is modifying the page tables and the page table change "leaks" into other TLBs. A proposed solution here is to make a copy of all the page tables for each CPU in the system. This is a rather heavy and difficult change to make, so I am not doing it for now :-) This item is also discussed in the Documentation/kmemcheck.txt file. PAE/PSE is fine; when a page is being tracked, we split it to 4k physical pages. This used to be a big problem but now I think we are finally there :-) The current tree can be found at: http://git.kernel.org/?p=linux/kernel/git/vegard/kmemcheck.git;a=shortlog;h=current I won't get angry if you decide to try it out ;-) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ