Re: dm-writecache issue

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Tue, 18 Sep 2018 08:33:24 -0700

On Tue, Sep 18, 2018 at 10:22:15AM -0400, Mikulas Patocka wrote:
> > On Tue, Sep 18, 2018 at 07:46:47AM -0400, Mikulas Patocka wrote:
> > > I would ask the XFS developers about this - why does mkfs.xfs select 
> > > sector size 512 by default?
> > 
> > Because the underlying device told it that it supported a
> > sector size of 512 bytes?
> 
> SSDs lie about this. They have 4k sectors internally, but report 512.

SSDs can't lie about the sector size because they don't even have
sectors in the disk sense, they have program and erase block size,
and some kind of FTL granularity (think of it like a file system block
size - even a 4k block size file can do smaller writes with
read-modify-write cycles, so can SSDs).

SSDs can just properly implement the guarantees they inherited from
disk by other means.  So if an SSD claims it supports 512 byte blocks
it better can deal with them atomically.  If they have issues in that
area (like Intel did recently where they corrupted data left right
and center if you actually did 512byte writes) they are simply buggy.

SATA and SAS SSDs can always use the same trick as modern disks to
support 512 byte access where really needed (e.g. BIOS and legacy
OSes) but give a strong hint to modern OSes that they don't want that
to be actually used with the physical block exponent.  NVMe doesn't
have anything like that yet, but we are working on something like
that in the NVMe TWG.