Re: dm-writecache issue

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Tue, 18 Sep 2018 10:09:00 -0400 (EDT)

On Tue, 18 Sep 2018, Eric Sandeen wrote:

> On 9/18/18 7:32 AM, Dave Chinner wrote:
> > On Tue, Sep 18, 2018 at 07:46:47AM -0400, Mikulas Patocka wrote:
> >> I would ask the XFS developers about this - why does mkfs.xfs select 
> >> sector size 512 by default?
> > 
> > Because the underlying device told it that it supported a
> > sector size of 512 bytes?
> 
> Not only that, but it must have told us that it had a /physical/ 512 sector.
> If it had even said physical/logical 4096/512, we would have chosen 4096.
> 
> What does please check blockdev --getpbsz --getss /dev/$FOO say at mkfs time?

On SSDs, physical sector size is not detectable - the ATA and NVME 
standards allows reporting physical sector size, but some SSD vendors 
report this as 512-bytes despite the fact that the SSD has 4k sectors 
internally.

I tested 5 SSDs (Samsung SSD 960 EVO NVME, KINGSTON SKC1000240G NVME, 
Samsung SSD 850 EVO SATA, Crucial MX100 SATA, Intel 520 SATA) - all of 
them have 4k sectors internally (i.e. the SSDs have higher IOPS for 4k 
writes than for 2k writes), but only the Crucial SSD reports 4096 in 
/sys/block/*/queue/physical_block_size. Intel and Samsung report 512.

The SSDs use 4k sectors to reduce the size of the mapping table (hardly 
any SSD vendor would want to use real 512-byte sectors and increase the 
size of the table 8 times) and they do read-modify-write for sub-4k 
writes. So, why do you want to do sub-4k writes in XFS? - they are slower.

For example, the Kingston NVME SSD has 5-times lower IOPS for 2k writes 
than for 4k writes. And if I use mkfs.xfs directly on it, it selects 
sectsz=512 for both metadata and log.

Mikulas