On 3/6/23 09:23, Matthew Wilcox wrote:
On Sun, Mar 05, 2023 at 12:22:15PM +0100, Hannes Reinecke wrote:
On 3/4/23 18:54, Matthew Wilcox wrote:
I think we're talking about different things (probably different storage
vendors want different things, or even different people at the same
storage vendor want different things).
Luis and I are talking about larger LBA sizes. That is, the minimum
read/write size from the block device is 16kB or 64kB or whatever.
In this scenario, the minimum amount of space occupied by a file goes
up from 512 bytes or 4kB to 64kB. That's doable, even if somewhat
suboptimal.
And so do I. One can view zones as really large LBAs.
Indeed it might be suboptimal from the OS point of view.
But from the device point of view it won't.
And, in fact, with devices becoming faster and faster the question is
whether sticking with relatively small sectors won't become a limiting
factor eventually.
Your concern seems to be more around shingled devices (or their equivalent
in SSD terms) where there are large zones which are append-only, but
you can still random-read 512 byte LBAs. I think there are different
solutions to these problems, and people are working on both of these
problems.
My point being that zones are just there because the I/O stack can only deal
with sectors up to 4k. If the I/O stack would be capable of dealing
with larger LBAs one could identify a zone with an LBA, and the entire issue
of append-only and sequential writes would be moot.
Even the entire concept of zones becomes irrelevant as the OS would
trivially only write entire zones.
All current filesystems that I'm aware of require their fs block size
to be >= LBA size. That is, you can't take a 512-byte blocksize ext2
filesystem and put it on a 4kB LBA storage device.
That means that files can only grow/shrink in 256MB increments. I
don't think that amount of wasted space is going to be acceptable.
So if we're serious about going down this path, we need to tell
filesystem people to start working out how to support fs block
size < LBA size.
That's a big ask, so let's be sure storage vendors actually want
this. Both supporting zoned devices & suporting 16k/64k block
sizes are easier asks.
Why, I know. And this really is a future goal.
(Possibly a very _distant_ future goal.)
Indeed we should concentrate on getting 16k/64k blocks initially.
Or maybe 128k blocks to help our RAIDed friends.
Cheers,
Hannes