wish list for Santa (was: Re: XFS reflink overhead, ioctl(FICLONE))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Dave,

To answer your question below:

When we sent our observations about ioctl(FICLONE) performance recently, starting this e-mail thread, we were hoping for one of several outcomes: Perhaps we were misusing the feature, in which case guidance on how to obtain better performance would be helpful. Or if we're not doing anything wrong, an explanation of why ioctl(FICLONE) isn't as fast as we expected based on experience with the clone-based crash-tolerance mechanism in AdvFS. In recent days we've been getting the latter, for which we are grateful. We may try to pass along your explanations in a paper we're writing; if so we'll offer y'all the opportunity to review this paper and ask if you'd like to be acknowledged.

In the longer term, we're very interested in any developments related to crash tolerance. The details of interfaces are less important as long as user-level applications can with reasonable convenience and performance obtain a simple guarantee: Following a power failure or other crash a file can always be restored to a state that the application deemed consistent (application-level invariants & correctness criteria hold). Ideally the application would like a synchronous function call whose successful return provides the consistent-recoverability guarantee for the current state of the file. That's the guarantee that the original failure-atomic msync() of EuroSys 2013 provided.

Obtaining this guarantee with ioctl(FICLONE) is quite convenient: When the application knows that the file is in a consistent state, the application makes a clone and stashes the clone in a safe place. Loosely speaking, the performance desired is that the work of cloning should be "O(delta) not O(data)", i.e., the time and effort required to make & stash a clone should be proportional to the amount of data in the file changed between consecutive clones, not to the logical size of the entire file. I gather from our recent correspondence that XFS cloning today requires O(data) time and effort, not O(delta). Which is progress; we have a much better understanding of what's going on under the hood.

We understand that you're volunteers and that you're busy with many important matters. We're not asking for any further work, though we'll surely applaud from the sidelines any improvements toward crash tolerance.

I've been thinking about alternative approaches to crash tolerance for over a decade. In practice today people use things like relational databases and transactional key-value stores to protect application data integrity from crashes. I'm interested in other approaches, including but not limited to failure-atomic msync() and the moral equivalents thereof implemented with help from file systems. I've worked on a half-dozen variants of this theme and I'd be happy to explain why I think this area is exciting to anyone willing to listen. In a nutshell I look forward to the day when file systems render relational databases and transactional key-value stores obsolete for some (not all) use cases.

Thanks again for your extraordinary help clarifying matters, which goes above & beyond the call of duty, and happy holidays!

-- Terence



On Tue, 20 Dec 2022, Dave Chinner wrote:

I mainly want to emphasize that nobody is asking for the behavior of AdvFS in that FAST 2015 paper.

OK, so what are you asking us to do, then?



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux