On Fri, 20 Oct 2006 14:32:37 -0400 Chris Mason <chris.mason@xxxxxxxxxx> wrote: > Hello everyone, > > O_DIRECT locking currently uses a few different per-inode locks to > prevent races between buffered io and direct io. This is awkward, and > sometimes grows races where we expose old data on disk. > > For example, I can't quite see how we protect from an mmap triggered > writepage from filling a hole in the middle of an O_DIRECT read. > > This patch set changes O_DIRECT to use page locks instead of > mutex/semaphores. It looks in the radix tree for pages affected by this > O_DIRECT read/wrte and locks any pages it finds. > > For any pages not present, a stub page struct is inserted into the > radix tree. The page cache routines are changed to either wait on this > place holder page or ignore it as appropriate. Place holders are not > valid pages at all, you can't trust page->index or any other field. > > The first patch introduces these place holder pages. The second patch > changes fs/direct-io.c to use them. Patch #2 needs work, > direct-io.c:lock_page_range can be made much faster, and it needs to be > changed to work in chunks instead of pinning down the whole range at > once. > > But, this is enough for people to comment on the basic idea. Testing > has been very light. I'm not sure I've covered all of the buffered vs > direct races yet. The main goal of posting now is to talk about the > place holder pages and possible optimizations. > > For the XFS guys, you probably want to avoid the page locking steps as > well, a later version will honor that. > Boy, it doesn't do much to simplify the code, does it? An opportunity for Linus to again share with us his opinions on direct-io. Conceptually, sticking locked pages into the mapping is a good thing to do, because it meshes in with all our existing synchronisation design. I think the fake placeholder page can be a kernel-wide thing rather than per-dio? That would be most desirable because then we have #define PagePlaceHolder(page) (page == global_placeholder_page) which saves a precious page flag. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html