2009/8/17 Ingo Molnar <mingo@xxxxxxx>: > > * Ingo Molnar <mingo@xxxxxxx> wrote: > >> >> * Martin-Éric Racine <q-funk@xxxxxx> wrote: >> >> > On Thu, Aug 13, 2009 at 9:34 PM, Rafael J. Wysocki<rjw@xxxxxxx> wrote: >> > > On Thursday 13 August 2009, Martin-Éric Racine wrote: >> > >> On Thu, Aug 13, 2009 at 5:54 PM, Rafael J. Wysocki<rjw@xxxxxxx> wrote: >> > >> > On Thursday 13 August 2009, Martin-Éric Racine wrote: >> > >> >> 2009/8/13 Martin-Éric Racine <q-funk@xxxxxx>: >> > >> >> > On Thu, Aug 13, 2009 at 12:07 PM, Ingo Molnar<mingo@xxxxxxx> wrote: >> > >> >> >> * Martin-Éric Racine <q-funk@xxxxxx> wrote: >> > >> >> >>> Yes, this bug is still valid. >> > >> >> >>> >> > >> >> >>> Ubuntu kernel team member Leann Ogasawara and I are slowly >> > >> >> >>> bisecting our way through the changes that took place since 2.6.30 >> > >> >> >>> to find the commit that introduced this regression. Please stay >> > >> >> >>> tuned. >> > >> >> >> >> > >> >> >> hm, the only outright Geode related commit was: >> > >> >> >> >> > >> >> >> d6c585a: x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure >> > >> >> >> >> > >> >> >> the jpg at: >> > >> >> >> >> > >> >> >> http://launchpadlibrarian.net/28892781/00002.jpg >> > >> >> >> >> > >> >> >> is very out of focus - but what i could decypher suggests a >> > >> >> >> pagefault crash in the VFS code, in generic_delete_inode(). >> > >> >> >> > >> >> This one might be a bit better: >> > >> >> >> > >> >> http://launchpadlibrarian.net/30267494/2.6.31-5.24.jpg >> > > >> > > Hmm. This looks like a sysfs oops to my untrained eye. >> > >> > The bisect I did with Leann Ogasawara has narrowed the kernel panic >> > down to the following: >> > >> > commit f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 >> > Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >> > Date: Mon Jun 8 19:50:45 2009 -0400 >> > >> > add caching of ACLs in struct inode >> > >> > No helpers, no conversions yet. >> > >> > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >> >> Weird. If the functions do what their name suggests, i.e. if >> inode_init_always() is an always called constructor and if >> destroy_inode() is an unconditional destructor then this patch >> should have no functional effect on the VFS side. >> >> It increases the size of struct inode, so if you have some old >> module (built to an older version of fs.h) still around it might >> corrupt your inode data structure. >> >> Or the size change might trigger some dormant bug. It might move a >> critical inode right into the path of a pre-existing (but not >> visibly crash-triggering) data corruption. >> >> The possibilities on the 'weird bug' front are endless - the >> crash/oops itself should be turned into text, posted here and >> analyzed. > > Btw., before you invest any time into the 'weird crash' theory, i'd > suggest to double check the bisection result: > > f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 crashes > f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1 boots fine > > You can save yourself from a lot of head scratching that way - the > bisection result looks weird. (albeit plausible - a VFS crash points > to a VFS commit.) > > _Maybe_ the bisection is just off a little bit (there was a > bisection mistake in the last few steps), and the real buggy commit > is one of the nearby ones: We double checked again last week with fresh builds and validated that the above result is correct. What puzzles us is the start of the crash: BUG: unable to handle kernel paging request at ffffb4ff IP: [<c01f716b>] __destroy_inode+0x4b/0x80 *pde = 00810067 *pte = 00000000 Oops: 0000 [#1] SMP last sysfs file: /sys/power/resume Any ideas? Martin-Éric -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html