On Sun, Mar 05, 2023 at 12:22:15PM +0100, Hannes Reinecke wrote: > One can view zones as really large LBAs. > > Indeed it might be suboptimal from the OS point of view. > But from the device point of view it won't. > And, in fact, with devices becoming faster and faster the question is > whether sticking with relatively small sectors won't become a limiting > factor eventually. > > My point being that zones are just there because the I/O stack can only deal > with sectors up to 4k. If the I/O stack would be capable of dealing > with larger LBAs one could identify a zone with an LBA, and the entire issue > of append-only and sequential writes would be moot. > Even the entire concept of zones becomes irrelevant as the OS would > trivially only write entire zones. > > What I was saying is that 256M is not set in stone. It's just a compromise > vendors used. Even if in the course of development we arrive > at a lower number of max LBA we can handle (say, 2MB) I am pretty > sure vendors will be quite interested in that. So I'm re-reading this again and I see what you're suggesting now Hannes. You are not not suggesting that the reason why we may want larger block sizes is due to zone storage support. But rather, you are suggesting that *if* we support larger block sizes, they effectively could be used as a replacement for smaller zone sizes. Your comments about 256 MiB zones is just a target max assumption for existing known zones. So in that sense, you seem to suggest that users of smaller zone sizes could potentially look at using instead larger block sizes, as there would be no other new "feature" other than existing efforts to ensure higher folio support are in place and / buffer heads addressed. But this misses the gains of zone storage on the FTL. The strong semantics of sequential writes and a write pointer differ for how an existing storage controller may deal with writing to *one* block. You are not forbidden to just modify a bit in non-zone storage, behind the scenes for instance the FTL would do whatever it thinks it has to, very likely a read-modify-write and it may just splash the write into one fresh block for you, so the write appears to happen in a flash but in reality it used a bit of the over provisioning blocks. But with zone storage you have a considerable reduction over over provisioning, which we don't get for with simple larger block size support for non zone drives. Luis