> -----Original Message----- > From: linux-mips-bounce@xxxxxxxxxxxxxx > [mailto:linux-mips-bounce@xxxxxxxxxxxxxx] On Behalf Of Atsushi Nemoto > Sent: Wednesday, November 15, 2006 8:07 AM > To: Trevor Hamm > Cc: linux-mips@xxxxxxxxxxxxxx > Subject: Re: Problems booting Linux 2.6.18.1 on MIPS34K core > > Then, I can imagine three (hardly possible) case: > > A. PG_dcache_dirty bit was cleared accidently. > > B. The page is accessed by user process without page_mapping() > > C. kernel forgot to call update_mmu_cache() at somewhere. > > If case A, removing "&& Page_dcache_dirty(page)" condition from > __update_cache() will hide your problem. If case B, calling > flush_dcache_page() unconditionally in __update_cache() will hide your > problem. > > Anyway for now I can not see why this can happen... > I've been doing more probing with our FS2 probe, and now have a much better understanding of what is going on. I used the probe to step through the copy_user_highpage() function while it's copying in the page which seems to be corrupted. What I found seems to suggest a problem with cache aliases. In copy_user_highpage(), it calls copy_page() with "vfrom" computed from the new kmap_coherent() function: if (cpu_has_dc_aliases) { vfrom = kmap_coherent(from, vaddr); copy_page(vto, vfrom); kunmap_coherent(from); } In my case, for the page of interest: vaddr = 0x2aaecb5c vfrom = 0xfffdc000 Phys. address of this page is 0x011ca000 When I examine the data in this page from both 0x011ca000 and 0xfffdc000, the contents are close to identical. When I look at the page through address 0x811ca000, I get completely different data, but it's the data I expect to see. So this tells me that data I want in the page is still in the dcache, but aliased address 0xfffdc000 cannot get at it. It just so happens that this aliasing is still occurring on the re-boot following a software reset, but some event between the lock-up and re-boot caused the cache contents to be written back into main memory, so the aliased page is getting the correct data by accident on the re-boot. If I flush the entire dcache with the FS2 probe just before entering copy_page(), the board boots from power-up without any issues. The log entry for the patch which introduced the kmap_coherent() function explains that the patch was a fix for dcache aliasing, yet it seems to be introducing a dcache alias here. Any idea why? > Just for confirm: > 1. This can happen on latest lmo git tree or 2.6.19-rc5. > 2. UP kernel. > 3. No L2 cache. > 4. icache and dcache are both virtually indexed and physically tagged. > All correct? > Except for #1 which I haven't tried (we need to get this working with 2.6.18), all correct. The caches are 64k, 4-way. Thanks, Trevor