Ok, I finally figured this one out and reproduced it on my ultra5. If, for example, we have a 4MB PTE and we do a write we'll only update one of the 8KB sub-PTEs of that mapping. This is fine until that new mapping gets displaced from the TLB and someone does a read that ends up hitting one of the sub-PTEs that didn't get it's write-enable bit set yet. At this point we have a problem, because if a write is made to the original address, the kernel says "the writable bit is set, nothing to do". So it won't flush the TLB, and therefore it won't kick out the TLB mapping brought in by the read. So we just get wedged here until something displaces that TLB entry. This is why X acts sluggish and since it can loop like this for quite a while the X server and the hardware can get plenty confused. The end result is that we have to make sure any PTE updates propagate to all sub-PTEs of a large mapping during any change. That's really expensive and we'd have to add some complex code to the set_pte_at() code path just to handle this. So the easiest way to fix this, without having to disable largepage PTE mappings of I/O devices, is the patch below. I will push this to Linus for 2.6.18 and -stable so that 2.6.17 gets it too. commit 6ad7d29d2edd8c3d632e71454f619f5c0c6c2703 Author: David S. Miller <davem@xxxxxxxxxxxxxxxxxxxx> Date: Mon Aug 28 00:33:03 2006 -0700 [SPARC64]: Fix X server hangs due to large pages. This problem was introduced by changeset 14778d9072e53d2171f66ffd9657daff41acfaed Unlike the hugetlb code paths, the normal fault code is not setup to propagate PTE changes for large page sizes correctly like the ones we make for I/O mappings in io_remap_pfn_range(). It is absolutely necessary to update all sub-ptes of a largepage mapping on a fault. Adding special handling for this would add considerably complexity to tlb_batch_add(). So let's just side-step the issue and forcefully dirty any writable PTEs created by io_remap_pfn_range(). The only other real option would be to disable to large PTE code of io_remap_pfn_range() and we really don't want to do that. Much thanks to Mikael Pettersson for tracking down this problem and testing debug patches. Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> diff --git a/arch/sparc64/mm/generic.c b/arch/sparc64/mm/generic.c index 8cb0620..af9d81d 100644 --- a/arch/sparc64/mm/generic.c +++ b/arch/sparc64/mm/generic.c @@ -69,6 +69,8 @@ static inline void io_remap_pte_range(st } else offset += PAGE_SIZE; + if (pte_write(entry)) + entry = pte_mkdirty(entry); do { BUG_ON(!pte_none(*pte)); set_pte_at(mm, address, pte, entry); - To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html