At 19:21 08/08/07, Chris Mason wrote: >On Thu, 2008-08-07 at 12:15 +0900, Hisashi Hifumi wrote: >> >/* >> > * This is like invalidate_complete_page(), except it ignores the page's >> > * refcount. We do this because invalidate_inode_pages2() needs >> >stronger >> > * invalidation guarantees, and cannot afford to leave pages behind >> >because >> > * shrink_page_list() has a temp ref on them, or because they're >> >transiently >> > * sitting in the lru_cache_add() pagevecs. >> > */ >> > >> > >> >I am wondering why we need stronger invalidate hurantees for DIO-> >> >invalidate_inode_pages_range(),which force the page being removed from >> >page cache? In case of bh is busy due to ext3 writeout, >> >journal_try_to_free_buffers() could return different error number(EBUSY) >> >to try_to_releasepage() (instead of EIO). In that case, could we just >> >leave the page in the cache, clean pageuptodate() (to force later buffer >> >read to read from disk) and then invalidate_complete_page2() return >> >successfully? Any issue with this way? >> >> My idea is that journal_try_to_free_buffers returns EBUSY if it fails due to >> bh busy, and dio write falls back to buffered write. This is easy to fix. >> >> > >What about the invalidates done after the DIO has already run >non-buffered? Dio write falls back to buffered IO when writing to a hole on ext3, I think. I want to apply this mechanism to fix this issue. When try_to_release_page fails on a page due to bh busy, dio write does buffered write, sync_page_range, and wait_on_page_writeback, imvalidates page cache to preserve dio semantics. Even if page invalidation that is carried out after wait_on_page_writeback fails, there is no inconsistency between HDD and page cache. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html