On Mon, Jun 20, 2011 at 03:56:19PM -0500, Kevan Rehm wrote: > Greetings, > > I've run into a case where the fsync() system call seems to have > returned before all file data was actually on disk. (A SLES11SP1 system > crash occurred shortly after an fsync which had returned zero. After > restarting the machine, the last I/O before the fsync is not in the > file.) In attempting to find the problem, I've come across code I don't > understand, and am hoping someone can enlighten me as to how things are > supposed to work. > > Routine xfs_vm_writepage has various situations under which it will > decide it can't currently initiate writeback on a page, and in that case > calls redirty_page_for_writepage, unlocks the page, and returns zero. > That seems to me to be incompatible with fsync(), so I'm obviously > missing some key piece of logic. > > The calling sequence of routines involved in fsync is: > > do_fsync->vfs_fsync->vfs_fsync_range-> > filemap_write_and_wait_range-> > __filemap_fdatawrite_range-> > do_writepages->generic_writepages-> > write_cache_pages > > Routine write_cache_pages walks the radix tree and calls > clear_page_dirty_for_io and then __writepage on each dirty page to > initiate writeback. __writepage calls xfs_vm_writepage. That routine > is occasionally unable to immediately start writeback of the page, and > so it calls redirty_page_for_writepage without setting the writeback flag. Hi Kevan, The current xfs_vm_writepage mainline code will only enter the redirty path if: - it is called from direct memory reclaim - it is called within a transaction context and we need to do an allocation transaction - it is WB_SYNC_NONE writeback and we can't get the inode lock without blocking during block mapping (EAGAIN case). None of these cases are triggered by fsync() driven (WB_SYNC_ALL) writeback, so AFAICT fsync() based writeback should not be skipping writeback of dirty pages in the given fsync range. So for a mainline kernel I don't think there are any problems w.r.t. fsync() and redirtying pages causing dirty pages to be skipped during writeback. However, the mainline writeback path has had significant change (especially to WB_SYNC_ALL writeback) since sles11sp1 was snapshotted (2.6.32, right?). Hence it is possible that one (or several) of the changes fixed this bug without us even realising it was a problem. That said, having dirty pages after an fsync is not necessarily an fsync bug - something coul dhave dirtied them while the fsync was in progress. I don't know any details of how this occurred, so I'm simply speculating that there could be other causes of the dirty pages you are seeing... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs