Hi Dave, On Fri, Jan 14, 2011 at 11:29:00AM +1100, Dave Chinner wrote: > I've noticed a few suspicious things trying to reproduce the > allocate-in-the-middle-of-a-delalloc-extent, ... > Secondly, I think we have the same expose-the-entire-delalloc-extent > -to-stale-data-exposure problem in ->writepage. This onnne, however, > is due to using BMAPI_ENTIRE to allocate the entire delalloc extent > the first time any part of it is written to. Even if we are only > writing a single page (i.e. wbc->nr_to_write = 1) and the delalloc > extent covers gigabytes. So, same problem when we crash. > > Finally, I think the extsize based problem exposed by test 229 is a > also a result of allocating space we have no pages covering in the > page cache (triggered by BMAPI_ENTIRE allocation) so the allocated > space is never zeroed and hence exposes stale data. This is precisely the bug I was going after when I hit the allocate-in-the-middle-of-a-delalloc-extent bug. This is a race between block_prepare_write/__xfs_get_blocks and writepage/xfs_page_state convert. When xfs_page_state_convert allocates a real extent for a page toward the beginning of a delalloc extent, XFS_BMAPI converts the entire delalloc extent. Any subsequent writes into the page cache toward the end of this freshly allocated extent will see a written extent instead of delalloc and read the block from disk into the page before writing over it. If the write does not cover the entire page garbage from disk will be exposed into the page cache. <snip> > I'm sure there are other ways to solve these problems, but these two > are the ones that come to mind for me. I'm open to other solutions > or ways to improve on these ones, especially if they are simpler. ;) > Anyone got any ideas or improvements? The direction I've been taking is to use XFS_BMAPI_EXACT in *xfs_iomap_write_allocate to ensure that an extent covering exactly the pages we're prepared to write out immediately is allocated and the rest of the delalloc extent is left as is. This exercises some of the btree code more heavily and led to the discovery of the allocate-in-the-middle-of-a-delalloc-extent bug. It also presents a performance issue which I've tried to resolve by extending xfs_probe_cluster to probe delalloc extents-- lock up all of the pages to be converted before performing the allocation and hold those locks until they are submitted for writeback. It's not very pretty but it resolves the corruption. There is still the issue of crashes... This could be solved by converting from delalloc to unwritten in xfs_page_state_convert in this very exact way and then to written in the io completion handler. Never go delalloc->written directly. I have not had luck reproducing this on TOT xfs and have come to realize that this is because it doesn't do speculative preallocation of larger delalloc extents unless you are using extsize... which I haven't tried. Regards, Ben _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs