On Sun, 4 Feb 2007 16:10:51 +0100 Nick Piggin <npiggin@xxxxxxx> wrote: > On Sun, Feb 04, 2007 at 03:15:49AM -0800, Andrew Morton wrote: > > On Sun, 4 Feb 2007 12:03:17 +0100 Nick Piggin <npiggin@xxxxxxx> wrote: > > > > > On Sun, Feb 04, 2007 at 02:56:02AM -0800, Andrew Morton wrote: > > > > On Sun, 4 Feb 2007 11:46:09 +0100 Nick Piggin <npiggin@xxxxxxx> wrote: > > > > > > > > If that recollection is right, I think we could afford to reintroduce that > > > > problem, frankly. Especially as it only happens in the incredibly rare > > > > case of that get_user()ed page getting unmapped under our feet. > > > > > > Dang. I was hoping to fix it without introducing data corruption. > > > > Well. It's a compromise. Being practical about it, I reeeealy doubt that > > anyone will hit this combination of circumstances. > > They're not likely to hit the deadlocks, either. Probability gets more > likely after my patch to lock the page in the fault path. But practially, > we could live without that too, because the data corruption it fixes is > very rare as well. Which is exactly what we've been doing quite happily > for most of 2.6, including all distro kernels (I think). Thing is, an application which is relying on the contents of that page is already unreliable (or really peculiar), because it can get indeterminate results anyway. > ... > > On a P4 Xeon, SMP kernel, on a tmpfs filesystem, a 1GB dd if=/dev/zero write > had the following performance (higher is worse): > > Orig kernel New kernel > new file (no pagecache) > 4K blocks 1.280s 1.287s (+0.5%) > 64K blocks 1.090s 1.105s (+1.4%) > notrunc (uptodate pagecache) > 4K blocks 0.976s 1.001s (+0.5%) > 64K blocks 0.780s 0.792s (+1.5%) > > [numbers are better than +/- 0.005] > > So we lose somewhere between half and one and a half of one percent > performance in a pagecache write intensive workload. That's not too bad - caches are fast. Did you look at optimising the handling of that temp page, ensure that we always use the same page? I guess the page allocator per-cpu-pages thing is being good here. I'm not sure how, though. Park a copy in the task_struct, just as an experiment. But that'd de-optimise multiple-tasks-writing-on-the-same-cpu. Maybe a per-cpu thing? Largely duplicates the page allocator's per-cpu-pages. Of course, we're also increasing caceh footprint, which this test won't show. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html