A patch for squashfs on s390

Pete Zaitcev <zaitcev@xxxxxxxxxx> · Sat, 2 Sep 2006 20:47:53 -0700

Installation of RHEL 5 Beta 1 crashes on s390 when the block layer attempts
to write out a dirty page into a mapping without a writeout method.

<1>Unable to handle kernel pointer dereference at virtual kernel address 0000000000000000 
......
<4>Call Trace: 
<4>([<00000000000d6924>] __mpage_writepage+0x2ac/0x720) 
<4> [<00000000000d7d08>] mpage_writepages+0x3a4/0x540 
<4> [<0000000000083f36>] do_writepages+0x5e/0x78 
<4> [<00000000000d58e8>] __writeback_single_inode+0x1d0/0x3ac 
<4> [<00000000000d5f72>] sync_sb_inodes+0x22a/0x310 
<4> [<00000000000d6126>] sync_inodes_sb+0xce/0xe8 
<4> [<00000000000d61cc>] __sync_inodes+0x8c/0xe8 
<4> [<00000000000d625e>] sync_inodes+0x36/0x60 
<4> [<00000000000aa2d4>] do_sync+0x3c/0xa4 
<4> [<00000000000aa366>] sys_sync+0x2a/0x3c 
<4> [<000000000001f6e8>] sysc_noemu+0x10/0x16 

This happens when get_block is NULL. The situation arises when a page
gets mistakenly marked as dirty.

Heiko @IBM debugged this and discovered that squashfs was the culprit.
The code in question is squashfs_readpage():

	for (i = start_index; i <= end_index && byte_offset < bytes;
					i++, byte_offset += PAGE_CACHE_SIZE) {
		struct page *push_page;
		int available_bytes = (bytes - byte_offset) > PAGE_CACHE_SIZE ?
					PAGE_CACHE_SIZE : bytes - byte_offset;
		if (i == page->index)  {
			........... do normal things
		} else if ((push_page =
				grab_cache_page_nowait(page->mapping, i))) {
 			pageaddr = kmap_atomic(push_page, KM_USER0);

			memcpy(pageaddr, data_ptr + byte_offset,
					available_bytes);
			memset(pageaddr + available_bytes, 0,
					PAGE_CACHE_SIZE - available_bytes);
			kunmap_atomic(pageaddr, KM_USER0);
			flush_dcache_page(push_page);
			SetPageUptodate(push_page);
			unlock_page(push_page);
			page_cache_release(push_page);
		}
	}

At some situations, this can get called twice on the same page.
When this happens first, the page is filled normally, and SetPageUptodate
clears dirty bit. Notice though, SetPageUptodate is different on s390
from all other architectures, because the dirty bit belongs to a page,
and not to a page table entry there. So, on second pass, SetPageUptodate
sees the page already up to date and does not clear the dirty bit.
This creates a dirty page which eventually oopses us.

Here's a patch which I made out of Heiko's patch by transforming it:

--- linux-2.6.17-1.2519.4.5.el5/fs/squashfs/inode.c	2006-08-25 01:44:10.000000000 -0400
+++ linux-2.6.17-1.2519.4.5.el5.z1/fs/squashfs/inode.c	2006-09-01 22:33:24.000000000 -0400
@@ -1554,8 +1554,15 @@ static int squashfs_readpage(struct file
 			flush_dcache_page(page);
 			SetPageUptodate(page);
 			unlock_page(page);
-		} else if ((push_page =
-				grab_cache_page_nowait(page->mapping, i))) {
+		} else {
+			push_page = grab_cache_page_nowait(page->mapping, i);
+			if (!push_page)
+				continue;
+			if (PageUptodate(push_page)) {
+				unlock_page(push_page);
+				page_cache_release(push_page);
+				continue;
+			}
  			pageaddr = kmap_atomic(push_page, KM_USER0);
 
 			memcpy(pageaddr, data_ptr + byte_offset,

Any objections?

-- Pete

P.S. When is squashfs going to be upstream?

-- 
VGER BF report: U 0.5
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html