Avi Kivity wrote:
Jeff Garzik wrote:
Speaking of RMW... in one sense, we have to deal with RMW anyway.
Upcoming ATA hard drives will be configured with a normal 512b sector
API interface, but underlying physical sector size is 1k or 4k.
The disk performs the RMW for us, but we must be aware of physical
sector size in order to determine proper alignment of on-disk data, to
minimize RMW cycles.
Virtualization has the same issue. OS installers will typically setup
the first partition at sector 63, and that means every page-sized block
access will be misaligned. Particularly bad when the guest's disk is
backed on a regular file.
Windows 2008 aligns partitions on a 1MB boundary, IIRC.
Makes a lot of sense...
At the moment, it seems like most of the effort to get these ATA
devices to perform efficiently is in getting partition / RAID stripe
offsets set up properly.
So perhaps for NVMHCI we could
(a) hardcode NVM sector size maximum at 4k
(b) do RMW in the driver for sector size >4k, and
Why not do it in the block layer? That way it isn't limited to one driver.
Sure. "in the driver" is a highly relative phrase :) If there is code
to be shared among multiple callsites, let's share it.
(c) export information indicating the true sector size, in a manner
similar to how the ATA driver passes that info to userland
partitioning tools.
Eventually we'll want to allow filesystems to make use of the native
sector size.
At the kernel level, you mean?
Filesystems already must deal with issues such as avoiding RAID stripe
boundaries (man mke2fs, search for 'RAID').
So I hope that same code should be applicable to cases where the
"logical sector size" (as exported by storage interface) differs from
"physical sector size" (the underlying hardware sector size, not
directly accessible by OS).
But if you are talking about filesystems directly supporting sector
sizes >4kb, well, I'll let Linus and others settle that debate :) I
will just write the driver once the dust settles...
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html