On Mon, 26 Dec 2005, Ryan Richter wrote: > On Thu, Dec 15, 2005 at 08:01:43PM -0800, James Bottomley wrote: > > On Thu, 2005-12-15 at 14:09 -0500, Ryan Richter wrote: > > > On Mon, Dec 12, 2005 at 12:24:42PM -0600, James Bottomley wrote: > > > > I'll find a fix for the real problem, but this patch isn't the cause. > > > > > > Is the patch set you posted yesterday supposed to fix this? If so, is > > > it available in patch form anywhere? > > > > No, I've been too busin integrating other people's patches to work on > > ones of my own. Try this. > > It was looking good, but... > > > Bad page state at free_hot_cold_page (in process 'taper', page ffff8100040e22e8) > flags:0x010000000000000c mapping:ffff810102bba238 mapcount:2 count:0 > Backtrace: > Looks familiar ;-( I don't have any new ideas but I will tell you what I have tried. In order to get more information about what is happening, I inserted the patch at the end of this message to my kernel. The purpose of the patch was to print something about the page mappings (prt_pages) and to catch possible double freeing from st earlier (clear page pointers). Running dump gave me the following output: st: page attributes before page_release 8 0: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:173706 1: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:67369 2: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:208590 3: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:201889 4: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:172689 5: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:172132 6: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:189032 7: flags:0x010000000000006c mapping:ffff81001e0e7698 mapcount:2 count:4 pfn:188229 This was repeated 37 times during the 11.5 GB dump. This shows that the taper process is using the same buffer the whole time to write the 32 kB blocks to tape. mapcount is 2 as in your error condition but the use count is 4. If nobody has better ideas, you could maybe try this. Kai --- linux-2.6.15-rc7/drivers/scsi/st.c 2005-12-25 10:39:09.000000000 +0200 +++ linux-2.6.15-rc7-k0/drivers/scsi/st.c 2005-12-27 18:05:31.000000000 +0200 @@ -4498,22 +4498,44 @@ static int sgl_map_user_pages(struct sca return res; } +static void prt_pages(struct scatterlist *sgl, const unsigned int nr_pages, char *s) +{ + int i; + + printk("st: page attributes %s page_release %d\n", s, nr_pages); + for (i=0; i < nr_pages; i++) { + struct page *page = sgl[i].page; + printk("%2d: flags:0x%0*lx mapping:%p mapcount:%d count:%d pfn:%ld\n", + i, (int)(2*sizeof(unsigned long)), (unsigned long)page->flags, + page->mapping, page_mapcount(page), page_count(page), page_to_pfn(sgl[i].page)); + } +} + /* And unmap them... */ static int sgl_unmap_user_pages(struct scatterlist *sgl, const unsigned int nr_pages, int dirtied) { int i; + static int ctr; + + if ((ctr++ % 10000) == 0) + prt_pages(sgl, nr_pages, "before"); for (i=0; i < nr_pages; i++) { struct page *page = sgl[i].page; + if (!page) { + printk("st: trying double free: %d %d\n", i, nr_pages); + continue; + } if (dirtied) SetPageDirty(page); /* FIXME: cache flush missing for rw==READ * FIXME: call the correct reference counting function */ page_cache_release(page); + sgl[i].page = NULL; } return 0; - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html