Hi Dave,
To answer your question below:
When we sent our observations about ioctl(FICLONE) performance recently,
starting this e-mail thread, we were hoping for one of several outcomes:
Perhaps we were misusing the feature, in which case guidance on how to
obtain better performance would be helpful. Or if we're not doing
anything wrong, an explanation of why ioctl(FICLONE) isn't as fast as we
expected based on experience with the clone-based crash-tolerance
mechanism in AdvFS. In recent days we've been getting the latter, for
which we are grateful. We may try to pass along your explanations in a
paper we're writing; if so we'll offer y'all the opportunity to review
this paper and ask if you'd like to be acknowledged.
In the longer term, we're very interested in any developments related to
crash tolerance. The details of interfaces are less important as long as
user-level applications can with reasonable convenience and performance
obtain a simple guarantee: Following a power failure or other crash a
file can always be restored to a state that the application deemed
consistent (application-level invariants & correctness criteria hold).
Ideally the application would like a synchronous function call whose
successful return provides the consistent-recoverability guarantee for the
current state of the file. That's the guarantee that the original
failure-atomic msync() of EuroSys 2013 provided.
Obtaining this guarantee with ioctl(FICLONE) is quite convenient: When
the application knows that the file is in a consistent state, the
application makes a clone and stashes the clone in a safe place. Loosely
speaking, the performance desired is that the work of cloning should be
"O(delta) not O(data)", i.e., the time and effort required to make & stash
a clone should be proportional to the amount of data in the file changed
between consecutive clones, not to the logical size of the entire file.
I gather from our recent correspondence that XFS cloning today requires
O(data) time and effort, not O(delta). Which is progress; we have a much
better understanding of what's going on under the hood.
We understand that you're volunteers and that you're busy with many
important matters. We're not asking for any further work, though we'll
surely applaud from the sidelines any improvements toward crash tolerance.
I've been thinking about alternative approaches to crash tolerance for
over a decade. In practice today people use things like relational
databases and transactional key-value stores to protect application data
integrity from crashes. I'm interested in other approaches, including but
not limited to failure-atomic msync() and the moral equivalents thereof
implemented with help from file systems. I've worked on a half-dozen
variants of this theme and I'd be happy to explain why I think this area
is exciting to anyone willing to listen. In a nutshell I look forward to
the day when file systems render relational databases and transactional
key-value stores obsolete for some (not all) use cases.
Thanks again for your extraordinary help clarifying matters, which goes
above & beyond the call of duty, and happy holidays!
-- Terence
On Tue, 20 Dec 2022, Dave Chinner wrote:
I mainly want to emphasize that nobody is asking for the behavior of
AdvFS in that FAST 2015 paper.
OK, so what are you asking us to do, then?