Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



James,

> Well a decade ago we did a lot of work to support 4k sector devices.
> Ultimately the industry went with 512 logical/4k physical devices
> because of problems with non-Linux proprietary OSs but you could still
> use 4k today if you wanted (I've actually still got a working 4k SCSI
> drive), so why is no NVMe device doing that?

FWIW, I have SATA, SAS, and NVMe devices that report 4KB logical.

The reason the industry converged on 512e is that the performance
problems were solved by ensuring correct alignment and transfer length.

Almost every I/O we submit is a multiple of 4KB. So if things are
properly aligned wrt. the device's physical block size, it is irrelevant
whether we express CDB fields in units of 512 bytes or 4KB. We're still
transferring the same number of bytes.

In addition 512e had two additional advantages that 4Kn didn't:

1. Legacy applications doing direct I/O and expecting 512-byte blocks
   kept working (albeit with a penalty for writes smaller than a
   physical block).

2. For things like PI where the 16-bit CRC is underwhelming wrt.
   detecting errors in 4096 bytes of data, leaving the protection
   interval at 512 bytes was also a benefit. So while 4Kn adoption
   looked strong inside enterprise disk arrays initially, several
   vendors ended up with 512e for PI reasons.

Once I/Os from the OS were properly aligned, there was just no
compelling reason for anyone to go with 4Kn and having to deal with
multiple SKUs, etc.

For NVMe 4Kn was prevalent for a while but drives have started
gravitating towards 512n/512e. Perhaps because of (1) above. Plus
whatever problems there may be on other platforms as you mentioned...

> This is not to say I think larger block sizes is in any way a bad idea
> ... I just think that given the history, it will be driven by
> application needs rather than what the manufacturers tell us.

I think it would be beneficial for Linux to support filesystem blocks
larger than the page size. Based on experience outlined above, I am not
convinced larger logical block sizes will get much traction. But that
doesn't prevent devices from advertising larger physical/minimum/optimal
I/O sizes and for us to handle those more gracefully than we currently
do.

-- 
Martin K. Petersen	Oracle Linux Engineering




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux