On Thu, 10 Mar 2011 08:32:09 -0500 Chris Mason <chris.mason@xxxxxxxxxx> wrote: > Excerpts from Jeff Layton's message of 2011-03-10 08:16:38 -0500: > > On Thu, 10 Mar 2011 04:26:31 -0800 (PST) > > Chris Mason <chris.mason@xxxxxxxxxx> wrote: > > > > > Excerpts from Steve French's message of 2011-03-09 17:13:06 -0500: > > > > On Wed, Mar 9, 2011 at 3:58 PM, Chris Mason <chris.mason@xxxxxxxxxx> wrote: > > > > > Excerpts from Dave Chinner's message of 2011-03-09 16:51:48 -0500: > > > > >> On Wed, Mar 09, 2011 at 01:44:24PM -0600, Steve French wrote: > > > > >> > Have alternative approaches, other than using wait_on_page_writeback, > > > > >> > been considered for solving the stable page write problem in similar > > > > >> > cases (since only about 1 out of 5 linux file systems uses this call > > > > >> > today). > > > > >> > > > > >> I think that is incorrect. write_cache_pages() does: > > > > >> > > > > >> 929 lock_page(page); > > > > >> ..... > > > > >> 950 if (PageWriteback(page)) { > > > > >> 951 if (wbc->sync_mode != WB_SYNC_NONE) > > > > >> 952 wait_on_page_writeback(page); > > > > >> 953 else > > > > >> 954 goto continue_unlock; > > > > >> 955 } > > > > >> 956 > > > > >> 957 BUG_ON(PageWriteback(page)); > > > > >> 958 if (!clear_page_dirty_for_io(page)) > > > > >> 959 goto continue_unlock; > > > > >> 960 > > > > >> 961 trace_wbc_writepage(wbc, mapping->backing_dev_info); > > > > >> 962 ret = (*writepage)(page, wbc, data); > > > > >> > > > > >> so every filesystem using the generic_writepages code already does > > > > >> this check and wait before .writepage is called. Hence only the > > > > >> filesystems that do not use generic_writepages() or > > > > >> mpage_writepages() need a specific check, and that means most > > > > >> filesystems are actually waiting on writeback pages correctly. > > > > > > > > > > But checking here just means we don't start writeback on a page that is > > > > > writeback, which is a good idea but not really related to stable pages? > > > > > > > > > > stable pages means we don't let mmap'd pages or file_write muck around > > > > > with the pages while they are in writeback, so we need to wait in > > > > > file_write and page_mkwrite. > > > > > > > > Isn't the file_write case covered by the i_mutex as > > > > Documentation/filesystems/Locking implies (for write_begin/write_end). > > > > > > > > > > Does cifs take i_mutex before writepage? The disk based filesystems > > > don't. So, i_mutex protects file_write from other procs jumping into > > > file_write, but it doesn't protect writeback from file_write jumping in > > > and changing the pages while they are being sent to storage (or over the > > > wire). > > > > > > Basically the model needs to be: > > > > > > file_write: > > > lock the page > > > wait on page writeback > > > > > > < new writeback cannot start because of the page lock > > > > copy_from_user > > > unlock the page > > > > > > We also use page_mkwrite to get notified when userland wants to change > > > some page it has given to mmap. That needs to wait on page writeback as > > > well. > > > > > > > No, cifs doesn't take the i_mutex in writepage, but the page is locked. > > cifs_write_begin calls grab_cache_page_write_begin, which returns a > > locked page and it's not unlocked until cifs_write_end. > > Ah ok, so you've got the page locked the whole time it is being sent > over the wire? The disk based filesystems split it and drop the page > lock once the page is set writeback, which is why we need the extra > waits. > > So in your case you should just need a page_mkwrite that locks the page. > Right, or do a wait_on_page_writeback. I think that may have a little less overhead since we won't need to unlock it and it may mean less serialization if there are other contenders for the page lock. -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html