On 10.10.2024 11:20, Christoph Hellwig wrote:
On Thu, Oct 10, 2024 at 09:13:27AM +0200, Javier Gonzalez wrote:
Is this because RocksDB already does seggregation per file itself? Are
you doing something specific on XFS or using your knoledge on RocksDB to
map files with an "unwritten" protocol in the midde?
XFS doesn't really do anything smart at all except for grouping files
with similar temperatures, but Hans can probably explain it in more
detail. So yes, this relies on the application doing the data separation
and using the most logical vehicle for it: files.
This makes sense. Agree.
In this context, we have collected data both using FDP natively in
RocksDB and using the temperatures. Both look very good, because both
are initiated by RocksDB, and the FS just passes the hints directly
to the driver.
I ask this to understand if this is the FS responsibility or the
application's one. Our work points more to letting applications use the
hints (as the use-cases are power users, like RocksDB). I agree with you
that a FS could potentially make an improvement for legacy applications
- we have not focused much on these though, so I trust you insights on
it.
As mentioned multiple times before in this thread this absolutely
depends on the abstraction level of the application. If the application
works on a raw device without a file system it obviously needs very
low-level control. And in my opinion passthrough is by far the best
interface for that level of control.
Passthru is great for prototyping and getting insights on end-to-end
applicability. We see though that it is difficult to get a full solution
based on it, unless people implement a use-space layer tailored to their
use-case (e.g., a version SPDK's bdev). After the POC phase, most folks
that can use passthru prefer to move to block - with a validated
use-case it should be easier to get things upstream.
This is exactly where we are now.
If the application is using a
file system there is no better basic level abstraction than a file,
which can then be enhanced with relatively small amount of additional
information going both ways: the file system telling the application
what good file sizes and write patterns are, and the application telling
the file system what files are good candidates to merge into the same
write stream if the file system has to merge multiple actively written
to files into a write stream. Trying to do low-level per I/O hints
on top of a file system is a recipe for trouble because you now have
to entities fighting over placement control.
For file, I agree with you.
If you saw the comments from Christian on the inode space, there are a
few plumbing challenges. Do you have any patches we could look at?