Re: [rfc] fsync_range?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 20, 2009 at 06:31:21PM +0000, Jamie Lokier wrote:
> Nick Piggin wrote:
> > Just wondering if we should add an fsync_range syscall like AIX and
> > some BSDs have? It's pretty simple for the pagecache since it
> > already implements the full sync with range syncs anyway. For
> > filesystems and user programs, I imagine it is a bit easier to
> > convert to fsync_range from fsync rather than use the sync_file_range
> > syscall.
> > 
> > Having a flags argument is nice, but AIX seems to use O_SYNC as a
> > flag, I wonder if we should follow?
> 
> I like the idea.  It's much easier to understand than sync_file_range,
> whose man page doesn't really explain how to use it correctly.
> 
> But how is fsync_range different from the sync_file_range syscall with
> all its flags set?

sync_file_range would have to wait, then write, then wait. It also
does not call into the filesystem's ->fsync function, I don't know
what the wider consequences of that are for all filesystems, but
for some it means that metadata required to read back the data is
not synced properly, and often it means that metadata sync will not
work.

Filesystems could also much more easily get converted to a ->fsync_range
function if that would be beneficial to any of them.


> For database writes, you typically write a bunch of stuff in various
> regions of a big file (or multiple files), then ideally fdatasync
> some/all of the written ranges - with writes committed to disk in the
> best order determined by the OS and I/O scheduler.
 
Do you know which databases do this? It will be nice to ask their
input and see whether it helps them (I presume it is an OSS database
because the "big" ones just use direct IO and manage their own
buffers, right?)

Today, they will have to just fsync the whole file. So they first must
identify which parts of the file need syncing, and then gather those
parts as a vector.


> For this, taking a vector of multiple ranges would be nice.
> Alternatively, issuing parallel fsync_range calls from multiple
> threads would approximate the same thing - if (big if) they aren't
> serialised by the kernel.

I was thinking about doing something like that, but I just wanted to
get basic fsync_range... OTOH, we could do an fsyncv syscall and gcc
could implement fsync_range on top of that?


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux