On Mon, Feb 08, 2010 at 11:18:52AM -0800, Andrew Morton wrote: > On Mon, 8 Feb 2010 10:10:46 +0000 > Mel Gorman <mel@xxxxxxxxx> wrote: > > > On Sun, Feb 07, 2010 at 01:34:58PM -0500, Tony Lill wrote: > > > On Friday 05 February 2010 06:20:00 Mel Gorman wrote: > > > > On Wed, Feb 03, 2010 at 02:39:21PM -0800, Andrew Morton wrote: > > > > > > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) > > > > > > > > This is a bit of a reach, but how confident are you that this version of > > > > gcc is building kernels correctly? > > > > > > > > There are a few disconnected reports of kernel problems with this > > > > particular version of gcc although none that I can connect with this > > > > problem or on x86 for that matter. One example is > > > > > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=536354 > > > > > > > > which reported problems building kernels on the s390 with that compiler. > > > > Moving to 4.2 helped them and it *should* have been fixed according to > > > > this bug > > > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=13012 > > > > > > > > It might be a red herring, but just to be sure, would you mind trying > > > > gcc 4.2 or 4.3 just to be sure please? > > > > > > Well, it was producing working kernels up until 2.6.30, but I recompiled with > > > gcc (Debian 4.3.2-1.1) 4.3.2 > > > and the box has been running nearly 48 hour without incident. My previous > > > record was 2. So I guess we can put this down to a new compiler bug. > > > > > > > Well, it's great the problem source is known but pinning down compiler bugs > > is a bit of a pain. Andrew, I don't recall an easy-as-in-bisection-easy > > means of identifying which part of the compile unit went wrong and why so > > it can be marked with #error for known broken compilers. Is there one or is > > it a case of asking for two objdumps of __rmqueue and making a stab at it? > > ugh. This is pretty rare. > Indeed. It does appear to be the case here and it's not the first bug related to gcc 4.1 and the kernel judging from search results on google. > Probably the best strategy is to generate the two page_alloc.s files, > fish out the __rmqueue part and then try to compare them. The key > part is to Cc Linus then thrash around stupidly for long enough for him > to take pity and find the bug for you. > Ok, step 1 then before I do the Team America Super Secret Signal to Linus for help. Tony, can you generate the .s files for me please? It should be a case of make clean rm *.s make CC=gcc-$BAD_VERSION KCFLAGS=-save-temps mm tar -czf kernel-s-files-bad-compiler.tar.gz .config *.s mm/*.c mm/*.h mm/Makefile mm/Kconfig make clean rm *.s make CC=gcc-$GOOD_VERSION KCFLAGS=-save-temps mm tar -czf kernel-s-files-good-compiler.tar.gz .config *.s mm/*.c mm/*.h mm/Makefile mm/Kconfig where $BAD_VERSION and $GOOD_VERSION are the two compiler versions and then post the two tarballs. It should contain what is needed. Thanks > > > I probably should have checked this before reporting a bug. Mea culpa > > > > Not at all. Miscompiles like this are rare and usually caught a lot quicker > > than this. If you hadn't reported the problem with two different machines, > > I would have blamed hardware and asked for a memtest. The only reason I > > spotted this might be a compiler was because the type of error you reported > > "couldn't happen". > > > > Thanks for reporting and testing. > > Yup. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>