Re: fsync_mode mount option for ext4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ted,

On Wed, May 29, 2019 at 01:23:32AM -0400, Theodore Ts'o wrote:
> On Wed, May 29, 2019 at 09:37:58AM +0530, Sahitya Tummala wrote:
> > 
> > Here is what I think on these mount options. Please correct me if my
> > understanding is wrong.
> > 
> > The nobarrier mount option poses risk even if there is a battery
> > protection against sudden power down, as it doesn't guarantee the ordering
> > of important data such as journal writes on the disk. On the storage
> > devices with internal cache, if the cache flush policy is out-of-order,
> > then the places where FS is trying to enforce barriers will be at risk,
> > causing FS to be inconsistent.
> 
> If you have protection against sudden shutdown, then nobarrier is
> perfectly safe --- which is to say, if it is guaranteed that any
> writes sent to device will be persisted after a crash, then nobarrier
> is perfectly safe.  So for example, if you are using ext4 connected to
> a million dollar EMC Storage Array, which has battery backup, using
> nobarrier is perfectly safe.
> 
> That's because we still send writes to the device in an appropriate
> order in nobarrier mode --- in particular, we send the journal updates
> to the device in order.  The cache flush policy on the HDD is
> out-of-order, but so long as they all make it out to persistant store
> in the end, it'll be fine.
> 
Got it.

> > But whereas with fsync_mode=nobarrier, FS is not trying to enforce
> > any ordering of data on the disk except to ensure the data is flushed
> > from the internal cache to non-volatile memory. Thus, I see this
> > fsync_mode=nobarrier is much better than a general nobarrier. And it
> > provides better performance too as with nobarrier but without
> > compromising much on FS consistency.
> 
> "without compomising much on FS consistency" doesn't have any meaning.
> If you care about FS consistency, and you don't have power fail
> protection, then at least for ext4, you *must* send a CACHE FLUSH
> after any time that you modify any file system metadata block --- and
> that's true for 99% of all fsync(2)'s.
> 
> I suppose you could do something where if there are times when no
> metadata updates are necessary, but just data block writes, the CACHE
> FLUSH could be suppressed.  But (a) this won't actually provide much
> performance improvements for the vast majority of workloads,
> especially on an Android system, and (b) you're making a value
> judgement that FS consistency is more important than application data
> consistency.
> 
> 
> You didn't answer my question directly --- exactly what is your goal
> that you are trying to achieve, and what assumptions you are willing
> to make?  If you have power fail protection (this might require making
> some adjustments to the EC), then you can use nobarrier and just not
> worry about it.
> 
> If you don't have power fail protection, and you care about FS
> consistency, then you pretty much have to leave the CACHE FLUSH
> commands in.
> 
> If the problem is that some applications are fsync-happy, then I'd
> suggest fixing the applications.  Or if you really don't care about
> the applications working correctly or users suffering application data
> loss after a crash, you could hack in a mode, so that for non-root
> users, or maybe certain specific users, fsync is turned into a no-op,
> or a background, asynchronous (non-integrity) writeback.
> 
> Are you trying to hit some benchmark target?  I'm really confused why
> you would want to be so cavalier with application data safety.
> 
Yes, benchmarks for random write/fsync show huge improvement.
For ex, without issuing flush in the ext4 fsync() the
random write score improves from 13MB/s to 62MB/s on eMMC,
using Androbench.

And fsync_mode=nobarrier is enabled by default on pixel phones
where f2fs is used.

https://android.googlesource.com/device/google/crosshatch/+/e02e4813256e51bacdecb93ffd8340f6efbe68e0

We have been getting requests to evaluate the same for EXT4 and
hence, I was checking with the community on its feasibility.

Thanks,
Sahitya.
>     	       	     		      - Ted




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux