On Mon, Aug 24, 2015 at 05:45:35PM -0400, Tejun Heo wrote: > Hello, > > On Mon, Aug 24, 2015 at 11:09:27PM +0200, Jan Kara wrote: > > It is inefficient, yes. But note that 'writeback' and 'dirty' states are > > completely independent. Page can be in any of the !dirty & !writeback, > > That isn't true for pages being dirtied through set_page_dirty(). > It's guaranteed that a dirty inode remains on one of the b_* lists > till there's no dirty page and writeback is complete. IO submission calls clear_page_dirty_for_io(), so by the time that all the pages have been submitted for IO, there are no dirty pages. IO submission also calls set_page_writeback() once the filesystem has decided to do IO on the page, and then IO completion calls end_page_writeback() to clear that state. IOWs, the page transitions from (dirty && !writeback) before submission to (!dirty && writeback) after submission, and to (!dirty && !writeback) once IO completion occurs. And you'll note that filemap_fdatawait() blocks on pages with the PAGECACHE_TAG_WRITEBACK tag set in the mapping tree, not dirty pages. Hence sync has to wait for all pages to transition out of writeback before we can consider the inode to be clean. > > dirty & !writeback, !dirty & writeback, dirty & writeback states. So mixing > > tracking of writeback and dirty state of an inode just makes the code even > > messier. > > I'm curious where and why they would deviate. Can you give me some > examples? AFAICS, anything which uses the usual set_page_dirty() path > shouldn't do that. mmaped files. page_mkwrite dirties page (dirty && !writeback) writepage clear_page_dirty_for_io (!dirty && !writeback) writepage starts writeback (!dirty && writeback) page_mkwrite dirties page (dirty && writeback) io completes (dirty && !writeback) This is done so we don't lose dirty state from page faults whilst the page is under IO and hence have sync miss the page next time through.... Of course, this behaviour is different if you have a filesystem or block device that requires stable pages (e.g. btrfs for data CRC validity). In this case, the page fault will block until the writeback state goes away... > > > > a list to track inodes with pages under writeback but they clashed with > > > > your patch series and they didn't get rebased yet AFAIR. > > > > > > Wouldn't it make more sense to simply put them on one of the existing > > > b_* lists? > > > > Logically it just doesn't make sense because as I wrote above dirty and > > writeback states are completely independent. Also you'd have to detect & > > skip inodes that don't really have any dirty pages to write and all the > > detection of "is there any data to write" would get more complicated. A > > separate list for inodes under writeback as Josef did is IMO the cleanest > > solution. > > Given that the usual code path tracks dirty and writeback together, I > don't think it's nonsensical; however, I'm more curious how common > writeback w/o dirtying case is. I suspect you've misunderstood the progression here. You can't get writeback without first going through dirty. But the transition to writeback clears the dirty page state so that we can capture page state changes while writeback is in progress. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html