Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Wed, 8 Mar 2023 11:35:32 -0800

On Sun, Mar 05, 2023 at 12:22:15PM +0100, Hannes Reinecke wrote:
> One can view zones as really large LBAs.
> 
> Indeed it might be suboptimal from the OS point of view.
> But from the device point of view it won't.
> And, in fact, with devices becoming faster and faster the question is
> whether sticking with relatively small sectors won't become a limiting
> factor eventually.
> 
> My point being that zones are just there because the I/O stack can only deal
> with sectors up to 4k. If the I/O stack would be capable of dealing
> with larger LBAs one could identify a zone with an LBA, and the entire issue
> of append-only and sequential writes would be moot.
> Even the entire concept of zones becomes irrelevant as the OS would
> trivially only write entire zones.
> 
> What I was saying is that 256M is not set in stone. It's just a compromise
> vendors used. Even if in the course of development we arrive
> at a lower number of max LBA we can handle (say, 2MB) I am pretty
> sure vendors will be quite interested in that.

So I'm re-reading this again and I see what you're suggesting now Hannes.

You are not not suggesting that the reason why we may want larger block
sizes is due to zone storage support.  But rather, you are suggesting
that *if* we support larger block sizes, they effectively could be used
as a replacement for smaller zone sizes.  Your comments about 256 MiB
zones is just a target max assumption for existing known zones.

So in that sense, you seem to suggest that users of smaller zone sizes
could potentially look at using instead larger block sizes, as there
would be no other new "feature" other than existing efforts to ensure
higher folio support are in place and / buffer heads addressed.

But this misses the gains of zone storage on the FTL. The strong semantics
of sequential writes and a write pointer differ for how an existing storage
controller may deal with writing to *one* block. You are not forbidden to
just modify a bit in non-zone storage, behind the scenes for instance the
FTL would do whatever it thinks it has to, very likely a read-modify-write
and it may just splash the write into one fresh block for you, so the
write appears to happen in a flash but in reality it used a bit of the
over provisioning blocks. But with zone storage you have a considerable
reduction over over provisioning, which we don't get for with simple larger
block size support for non zone drives.

  Luis