On Sat, 2 Aug 2008, Alan Jenkins wrote: > Alan Jenkins wrote: > > ...followed by several secondary BUGs; most happened as I tried to open > > new Konsole instances. My computer soon became unusable - X restarted > > and then froze, but it responded to SysRQs. It may just have been all > > my processes dying, but there was more disk activity than I expected. > > > > Strictly speaking I was running v2.6.26-8042-gce6fce4, with a two-line > > patch to fix a different problem (see > > <http://bugzilla.kernel.org/show_bug.cgi?id=11178>). (Yes, I owe you for that patch: saved me a bisect, thank you!) > > > > In case it matters, this happened some time after a series of maybe 3 > > suspend/resume cycles in quick succession. As you can see it happened > > in the middle of running git; I forget exactly what I was doing. > > It happened again. I didn't get any BUG in ext3 this time; just a > disabling stream of BUGs in copy_page_c. They started a few seconds > after resume. So I'm now confident that this is triggered by suspend to > ram. > > I first noticed it after running an ls command (ls /var/cache/polipo), > which was Killed. I was running polipo at the time; it wouldn't have > been the first access to this directory. However it was probably the > first access to this directory after the computer was woken from suspend > to ram. > > I had the same two-line PCI patch applied. This time it was atop a > genuine descendant of v2.6.27-rc1, viz v2.6.27-rc1-156-g94ad374. > > I've put the full trace showing all the BUGs at > <http://www-student.cs.york.ac.uk/~aj504/dmesg-suspend-BUG-copy_page_c.txt>. Your first report had twenty oopses of this kind: [ 228.358397] BUG: unable to handle kernel paging request at ffff88004fcXXXXX [ 228.358423] PGD 202063 PUD 8067 PMD 800000004fc03000 whereas it should be PMD 800000004fc001e3 Your second report had six oopses of this kind: [19280.236437] BUG: unable to handle kernel paging request at ffff88004fbXXXXX [19280.236645] PGD 202063 PUD 8067 PMD 803c85370cfc01e3 whereas it should be PMD 800000004fa001e3 Those corrupted PMD entries are why it's crashing: not (or very unlikely to be) a problem with ext3 or copy_page_c themselves. But it does seem likely that it's connected with suspend/resume. I think I'd try editing my drivers/base/power/main.c, inserting some tests and printks in suspend_device, suspend_device_noirq, resume_device, resume_device_noirq (hope they're sensible places: Rafael may have better advice). You want to check that the unsigned long at 0xffff8800000083e8 is 0x800000004fa001e3 and the unsigned long at 0xffff8800000083f0 is 0x800000004fc001e3 with printk of device name where it goes wrong. Or you may find I'm wrong and those are different from the start (changing a page attribute within a 0x200000 range would have to break up the 0x1e3 entries: I do wonder whether a change of page attribute might even be responsible). Hugh _______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/linux-pm