Matias, Keith, thanks, this all sounds good and it makes total sense to hide striping from the user. In the end, the real problem really seems to be that ZNS effectively requires in-order IO delivery which the kernel cannot guarantee. I think fixing this problem in the ZNS specification instead of in the communication substrate (kernel) is problematic, especially as out-of-order delivery absolutely has no benefit in the case of ZNS. But I guess this has been discussed before.. On Thu, Jun 18, 2020 at 2:19 PM Keith Busch <kbusch@xxxxxxxxxx> wrote: > > On Thu, Jun 18, 2020 at 01:47:20PM -0700, Heiner Litz wrote: > > the striping explanation makes sense. In this case will rephase to: It > > is sufficient to support large enough un-splittable writes to achieve > > full per-zone bandwidth with a single writer/single QD. > > This is subject to the capabilities of the device and software's memory > constraints. The maximum DMA size for a single request an nvme device can > handle often range anywhere from 64k to 4MB. The pci nvme driver maxes out at > 4MB anyway because that's the most we can guarantee forward progress right now, > otherwise the scatter lists become to big to ensure we'll be able to allocate > one to dispatch a write command. > > We do report the size and the alignment constraints so that it won't get split, > but we still have to work with applications that don't abide by those > constraints. > > > My main point is: There is no fundamental reason for splitting up > > requests intermittently just to re-assemble them in the same form > > later.