Jens, could you please explain the real purpose of WAIT_SYNC? In case of wbc->sync_mode == WB_SYNC_ALL. Because my current understanding is if writeback control has WB_SYNC_ALL everything should be submitted with WAIT_SYNC. -- Roman On Wed, Feb 19, 2014 at 10:38 AM, Roman Peniaev <r.peniaev@xxxxxxxxx> wrote: > (my previous email was rejected by vger.kernel.org because google web > sent it as html. > will resend the same one in plain text mode) > >> What do REQ_SYNC and REQ_NOIDLE actually *do*? > > Yep, this REQ_SYNC is very confusing to me. > First of all according to the sources of old school block buffer filesystems > (e.g. ext2) we can get this stack in case of fsync call: > > __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode) > do_writepages(mapping, &wbc) > mapping->a_ops->writepages(page, wbc) > (ext2_writepages) > mpage_writepages(mapping, wbc, fat_get_block); > write_cache_pages(mapping, wbc, __mpage_writepage, &mpd) > __mpage_writepage(page, wbc, data) >>>>>> mpage_bio_submit(WRITE, bio) >>>> why WRITE? not WRITE_SYNC in case of WB_SYNC_ALL? > <or in case of not contiguous buffers> > mapping->a_ops->writepage(page, wbc) > (ext2_writepage) > block_write_full_page(page, fat_get_block, wbc) > block_write_full_page_endio(page, get_block, wbc, > end_buffer_async_write) > __block_write_full_page(inode, page, get_block, wbc, > handler); > submit_bh(WRITE_SYNC) > > So, it turns out to be that some bios for the same dirty range > can be submitted with REQ_WRITE|REQ_SYNC|REQ_NOIDLE and some of > the bios only with REQ_WRITE. > (according to the comment of __mpage_writepage: > * If all blocks are found to be contiguous then the page can go into the > * BIO. Otherwise fall back to the mapping's writepage(). > ) > > Also, it seems to me that all over the kernel WRITE_SYNC has meaning of: > 1. try to get the block on-disk faster > 2. if I have to do flush - mark my bio with WRITE_SYNC and wait for result > > My patch is an attempt to make some unification in case of fsync call. > > -- > Roman > > > On Wed, Feb 19, 2014 at 8:59 AM, Andrew Morton > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: >> On Sun, 16 Feb 2014 11:54:28 +0900 Roman Pen <r.peniaev@xxxxxxxxx> wrote: >> >>> In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity write, >>> thus mark request as WRITE_SYNC. >> >> gargh, the documentation for this stuff is useless. >> >> What do REQ_SYNC and REQ_NOIDLE actually *do*? >> >> For mpage writes, REQ_NOIDLE appears to be incorrect - we very much >> expect that there will be more writes and that they will be contiguous >> with this one. But we won't be waiting on this write before submitting >> more writes, so perhaps REQ_NOIDLE is at least harmless. >> >> I dunno about REQ_SYNC - it requires delving into the bowels of CFQ >> and we shouldn't need to do that. >> >> Jens. Help. How is a poor kernel reader supposed to work this out? >> >>> --- a/fs/mpage.c >>> +++ b/fs/mpage.c >>> @@ -462,6 +462,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc, >>> struct buffer_head map_bh; >>> loff_t i_size = i_size_read(inode); >>> int ret = 0; >>> + int wr = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); >>> >>> if (page_has_buffers(page)) { >>> struct buffer_head *head = page_buffers(page); >>> @@ -570,7 +571,7 @@ page_is_mapped: >>> * This page will go to BIO. Do we need to send this BIO off first? >>> */ >>> if (bio && mpd->last_block_in_bio != blocks[0] - 1) >>> - bio = mpage_bio_submit(WRITE, bio); >>> + bio = mpage_bio_submit(wr, bio); >>> >>> alloc_new: >>> if (bio == NULL) { >>> @@ -587,7 +588,7 @@ alloc_new: >>> */ >>> length = first_unmapped << blkbits; >>> if (bio_add_page(bio, page, length, 0) < length) { >>> - bio = mpage_bio_submit(WRITE, bio); >>> + bio = mpage_bio_submit(wr, bio); >>> goto alloc_new; >>> } >>> >>> @@ -620,7 +621,7 @@ alloc_new: >>> set_page_writeback(page); >>> unlock_page(page); >>> if (boundary || (first_unmapped != blocks_per_page)) { >>> - bio = mpage_bio_submit(WRITE, bio); >>> + bio = mpage_bio_submit(wr, bio); >>> if (boundary_block) { >>> write_boundary_block(boundary_bdev, >>> boundary_block, 1 << blkbits); >>> @@ -632,7 +633,7 @@ alloc_new: >>> >>> confused: >>> if (bio) >>> - bio = mpage_bio_submit(WRITE, bio); >>> + bio = mpage_bio_submit(wr, bio); >>> >>> if (mpd->use_writepage) { >>> ret = mapping->a_ops->writepage(page, wbc); >>> @@ -688,8 +689,11 @@ mpage_writepages(struct address_space *mapping, >>> }; >>> >>> ret = write_cache_pages(mapping, wbc, __mpage_writepage, &mpd); >>> - if (mpd.bio) >>> - mpage_bio_submit(WRITE, mpd.bio); >>> + if (mpd.bio) { >>> + int wr = (wbc->sync_mode == WB_SYNC_ALL ? >>> + WRITE_SYNC : WRITE); >>> + mpage_bio_submit(wr, mpd.bio); >>> + } >>> } >>> blk_finish_plug(&plug); >>> return ret; >>> @@ -706,8 +710,11 @@ int mpage_writepage(struct page *page, get_block_t get_block, >>> .use_writepage = 0, >>> }; >>> int ret = __mpage_writepage(page, wbc, &mpd); >>> - if (mpd.bio) >>> - mpage_bio_submit(WRITE, mpd.bio); >>> + if (mpd.bio) { >>> + int wr = (wbc->sync_mode == WB_SYNC_ALL ? >>> + WRITE_SYNC : WRITE); >>> + mpage_bio_submit(wr, mpd.bio); >>> + } >>> return ret; >>> } >>> EXPORT_SYMBOL(mpage_writepage); >> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html