Re: Triggering non-integrity writeback from userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thanks for looking into this.

On 2015-10-25 08:39:12 +1100, Dave Chinner wrote:
> WB_SYNC_ALL is simply a method of saying "writeback all dirty pages
> and don't skip any". That's part of a data integrity operation, but
> it's not what results in data integrity being provided. It may cause
> some latencies caused by blocking on locks or in the request queues,
> so that's what I'd be looking for.

It also means we'll wait for more:
int write_cache_pages(struct address_space *mapping,
		      struct writeback_control *wbc, writepage_t writepage,
		      void *data)
{
...
	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
		tag = PAGECACHE_TAG_TOWRITE;
	else
		tag = PAGECACHE_TAG_DIRTY;
...
			if (PageWriteback(page)) {
				if (wbc->sync_mode != WB_SYNC_NONE)
					wait_on_page_writeback(page);
				else
					goto continue_unlock;
			}

> i.e. if the request queues are full, SYNC_FILE_RANGE_WRITE will
> block until all the IO it has been requested to write has been
> submitted to the request queues. Put simply: the IO is asynchronous
> in that we don't wait for completion, but the IO submission is still
> synchronous.

That's desirable in our case because there's a limit to how much
outstanding IO there is.

> Data integrity operations require related file metadata (e.g. block
> allocation trnascations) to be forced to the journal/disk, and a
> device cache flush issued to ensure the data is on stable storage.
> SYNC_FILE_RANGE_WRITE does neither of these things, and hence while
> the IO might be the same pattern as a data integrity operation, it
> does not provide such guarantees.

Which is desired here - the actual integrity is still going to be done
via fsync(). The idea of using SYNC_FILE_RANGE_WRITE beforehand is that
the fsync() will only have to do very little work. The language in
sync_file_range(2) doesn't inspire enough confidence for using it as an
actual integrity operation :/

> > If I followed the code correctly - not a sure thing at all - that means
> > bios are submitted with WRITE_SYNC specified. Not really what's needed
> > in this case.
>
> That just allows the IO scheduler to classify them differently to
> bulk background writeback.

It also influences which writes are merged and which are not, at least
if I understand elv_rq_merge_ok() and the callbacks it calls..

> You don't want to do writeback from the syscall, right? i.e. you'd
> like to expire the inode behind the fd, and schedule background
> writeback to run on it immediately?

Yes, that's exactly what we want. Blocking if a process has done too
much writes is fine tho.

Greetings,

Andres Freund
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux