Re: Proposal for "proper" durable fsync() and fdatasync()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Garzik wrote:
> >It's not optimal even then.
> >
> >  Devices: On a software RAID, you ideally don't want to issue flushes
> >  to all drives if your database did a 1 block commit entry.  (But they
> >  probably use O_DIRECT anyway, changing the rules again).  But all that
> >  can be optimised in generic VFS code eventually.  It doesn't need
> >  filesystem assistance in most cases.
> 
> My own idea is that we create a FLUSH command for blkdev request queues, 
> to exist alongside READ, WRITE, and the current barrier implementation. 
>  Then FLUSH could be passed down through MD or DM.

I like your thought, and it has the benefit of being simple.

My thought is very similar, but with (hopefully not premature...)
optimisations:

  - I would merge FLUSH with a preceding write in some cases,
    converting to an FUA-write command.  Probably the generic request
    queue is the best place to detect and merge.  This is so that
    userspace filesystems (including guest VMs) and databases can do
    journal commits with the same I/O sequence as in kernel
    filesystems.

  - I would create BARRIER too, so that a userspace API can ask for
    this weaker form of fsync, which may improve throughput of
    userspace journalling.  

  - I would include a sector range in FLUSH and BARRIER, for MD and DM
    to flush _only_ relevant sub-devices.  This may improve performance
    for journalling both kernel and userspace filesystems, as journal
    commits are often very small and hit one or two sub-devices in RAID.

  - I would ask the nice MD and DM people to take tag-barriers rather
    than flush-barriers on the input queue, converting to
    tag-barriers, flush-barriers and independent FLUSH on the
    sub-device queues according to sector ranges and subsequent
    writes.  It's not obvious, but my barrier proposal which started
    this thread is designed to support an efficient inter-sub-device
    flush-barrier when necessary, and single-sub-device tag-barrier
    when possible.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux