Hi Ted, On Wed, May 29, 2019 at 01:23:32AM -0400, Theodore Ts'o wrote: > On Wed, May 29, 2019 at 09:37:58AM +0530, Sahitya Tummala wrote: > > > > Here is what I think on these mount options. Please correct me if my > > understanding is wrong. > > > > The nobarrier mount option poses risk even if there is a battery > > protection against sudden power down, as it doesn't guarantee the ordering > > of important data such as journal writes on the disk. On the storage > > devices with internal cache, if the cache flush policy is out-of-order, > > then the places where FS is trying to enforce barriers will be at risk, > > causing FS to be inconsistent. > > If you have protection against sudden shutdown, then nobarrier is > perfectly safe --- which is to say, if it is guaranteed that any > writes sent to device will be persisted after a crash, then nobarrier > is perfectly safe. So for example, if you are using ext4 connected to > a million dollar EMC Storage Array, which has battery backup, using > nobarrier is perfectly safe. > > That's because we still send writes to the device in an appropriate > order in nobarrier mode --- in particular, we send the journal updates > to the device in order. The cache flush policy on the HDD is > out-of-order, but so long as they all make it out to persistant store > in the end, it'll be fine. > Got it. > > But whereas with fsync_mode=nobarrier, FS is not trying to enforce > > any ordering of data on the disk except to ensure the data is flushed > > from the internal cache to non-volatile memory. Thus, I see this > > fsync_mode=nobarrier is much better than a general nobarrier. And it > > provides better performance too as with nobarrier but without > > compromising much on FS consistency. > > "without compomising much on FS consistency" doesn't have any meaning. > If you care about FS consistency, and you don't have power fail > protection, then at least for ext4, you *must* send a CACHE FLUSH > after any time that you modify any file system metadata block --- and > that's true for 99% of all fsync(2)'s. > > I suppose you could do something where if there are times when no > metadata updates are necessary, but just data block writes, the CACHE > FLUSH could be suppressed. But (a) this won't actually provide much > performance improvements for the vast majority of workloads, > especially on an Android system, and (b) you're making a value > judgement that FS consistency is more important than application data > consistency. > > > You didn't answer my question directly --- exactly what is your goal > that you are trying to achieve, and what assumptions you are willing > to make? If you have power fail protection (this might require making > some adjustments to the EC), then you can use nobarrier and just not > worry about it. > > If you don't have power fail protection, and you care about FS > consistency, then you pretty much have to leave the CACHE FLUSH > commands in. > > If the problem is that some applications are fsync-happy, then I'd > suggest fixing the applications. Or if you really don't care about > the applications working correctly or users suffering application data > loss after a crash, you could hack in a mode, so that for non-root > users, or maybe certain specific users, fsync is turned into a no-op, > or a background, asynchronous (non-integrity) writeback. > > Are you trying to hit some benchmark target? I'm really confused why > you would want to be so cavalier with application data safety. > Yes, benchmarks for random write/fsync show huge improvement. For ex, without issuing flush in the ext4 fsync() the random write score improves from 13MB/s to 62MB/s on eMMC, using Androbench. And fsync_mode=nobarrier is enabled by default on pixel phones where f2fs is used. https://android.googlesource.com/device/google/crosshatch/+/e02e4813256e51bacdecb93ffd8340f6efbe68e0 We have been getting requests to evaluate the same for EXT4 and hence, I was checking with the community on its feasibility. Thanks, Sahitya. > - Ted