(my previous email was rejected by vger.kernel.org because google web sent it as html. will resend the same one in plain text mode) > What do REQ_SYNC and REQ_NOIDLE actually *do*? Yep, this REQ_SYNC is very confusing to me. First of all according to the sources of old school block buffer filesystems (e.g. ext2) we can get this stack in case of fsync call: __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode) do_writepages(mapping, &wbc) mapping->a_ops->writepages(page, wbc) (ext2_writepages) mpage_writepages(mapping, wbc, fat_get_block); write_cache_pages(mapping, wbc, __mpage_writepage, &mpd) __mpage_writepage(page, wbc, data) >>>>> mpage_bio_submit(WRITE, bio) >>>> why WRITE? not WRITE_SYNC in case of WB_SYNC_ALL? <or in case of not contiguous buffers> mapping->a_ops->writepage(page, wbc) (ext2_writepage) block_write_full_page(page, fat_get_block, wbc) block_write_full_page_endio(page, get_block, wbc, end_buffer_async_write) __block_write_full_page(inode, page, get_block, wbc, handler); submit_bh(WRITE_SYNC) So, it turns out to be that some bios for the same dirty range can be submitted with REQ_WRITE|REQ_SYNC|REQ_NOIDLE and some of the bios only with REQ_WRITE. (according to the comment of __mpage_writepage: * If all blocks are found to be contiguous then the page can go into the * BIO. Otherwise fall back to the mapping's writepage(). ) Also, it seems to me that all over the kernel WRITE_SYNC has meaning of: 1. try to get the block on-disk faster 2. if I have to do flush - mark my bio with WRITE_SYNC and wait for result My patch is an attempt to make some unification in case of fsync call. -- Roman On Wed, Feb 19, 2014 at 8:59 AM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Sun, 16 Feb 2014 11:54:28 +0900 Roman Pen <r.peniaev@xxxxxxxxx> wrote: > >> In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity write, >> thus mark request as WRITE_SYNC. > > gargh, the documentation for this stuff is useless. > > What do REQ_SYNC and REQ_NOIDLE actually *do*? > > For mpage writes, REQ_NOIDLE appears to be incorrect - we very much > expect that there will be more writes and that they will be contiguous > with this one. But we won't be waiting on this write before submitting > more writes, so perhaps REQ_NOIDLE is at least harmless. > > I dunno about REQ_SYNC - it requires delving into the bowels of CFQ > and we shouldn't need to do that. > > Jens. Help. How is a poor kernel reader supposed to work this out? > >> --- a/fs/mpage.c >> +++ b/fs/mpage.c >> @@ -462,6 +462,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc, >> struct buffer_head map_bh; >> loff_t i_size = i_size_read(inode); >> int ret = 0; >> + int wr = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); >> >> if (page_has_buffers(page)) { >> struct buffer_head *head = page_buffers(page); >> @@ -570,7 +571,7 @@ page_is_mapped: >> * This page will go to BIO. Do we need to send this BIO off first? >> */ >> if (bio && mpd->last_block_in_bio != blocks[0] - 1) >> - bio = mpage_bio_submit(WRITE, bio); >> + bio = mpage_bio_submit(wr, bio); >> >> alloc_new: >> if (bio == NULL) { >> @@ -587,7 +588,7 @@ alloc_new: >> */ >> length = first_unmapped << blkbits; >> if (bio_add_page(bio, page, length, 0) < length) { >> - bio = mpage_bio_submit(WRITE, bio); >> + bio = mpage_bio_submit(wr, bio); >> goto alloc_new; >> } >> >> @@ -620,7 +621,7 @@ alloc_new: >> set_page_writeback(page); >> unlock_page(page); >> if (boundary || (first_unmapped != blocks_per_page)) { >> - bio = mpage_bio_submit(WRITE, bio); >> + bio = mpage_bio_submit(wr, bio); >> if (boundary_block) { >> write_boundary_block(boundary_bdev, >> boundary_block, 1 << blkbits); >> @@ -632,7 +633,7 @@ alloc_new: >> >> confused: >> if (bio) >> - bio = mpage_bio_submit(WRITE, bio); >> + bio = mpage_bio_submit(wr, bio); >> >> if (mpd->use_writepage) { >> ret = mapping->a_ops->writepage(page, wbc); >> @@ -688,8 +689,11 @@ mpage_writepages(struct address_space *mapping, >> }; >> >> ret = write_cache_pages(mapping, wbc, __mpage_writepage, &mpd); >> - if (mpd.bio) >> - mpage_bio_submit(WRITE, mpd.bio); >> + if (mpd.bio) { >> + int wr = (wbc->sync_mode == WB_SYNC_ALL ? >> + WRITE_SYNC : WRITE); >> + mpage_bio_submit(wr, mpd.bio); >> + } >> } >> blk_finish_plug(&plug); >> return ret; >> @@ -706,8 +710,11 @@ int mpage_writepage(struct page *page, get_block_t get_block, >> .use_writepage = 0, >> }; >> int ret = __mpage_writepage(page, wbc, &mpd); >> - if (mpd.bio) >> - mpage_bio_submit(WRITE, mpd.bio); >> + if (mpd.bio) { >> + int wr = (wbc->sync_mode == WB_SYNC_ALL ? >> + WRITE_SYNC : WRITE); >> + mpage_bio_submit(wr, mpd.bio); >> + } >> return ret; >> } >> EXPORT_SYMBOL(mpage_writepage); > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html