RFC: 512e ZBC host-managed disks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Regular block devices are always accessible in units of logical block
sizes, regardless of the actual physical block size that the device has.
For hard disks, the common cases are:

512n: 512 B logical and physical blocks
512e: 512B logical blocks and 4096B physical blocks
4Kn: 4096B logical and physical blocks

and the sd.c in the kernel checks requests 512B "sectors" position and
size alignment against the disk declared logical block size. All is fine
with this, nothing new.

However, for host-managed zoned block devices (ZBC), the 512e case
breaks this model: the standard allows for 512B logical block reads,
*but* writes MUST be aligned on 4KB boundaries within sequential zones
(still using the 512B logical block size addressing). This is a problem
for users of the disk, e.g. an FS, who may wrongly believe that writing
512B units is possible (and so that it can use 512B FS block size).
Host-aware devices do not have this restriction. Nor does the
restriction apply to writes in conventional zones of host-managed devices.

Summary: for HM 512e block devices, reads are 512e compliant, but writes
in sequential zones are 4Kn compliant.

I would like an opinion on if we should do something about this. I see
the following possible options:

(1) Do nothing and let the disk user deal with the write alignment
problem. It already has to do so anyway as writes must be sequential.
But this would force in-kernel users to go and look at the device
physical block size, which is not something usually done by layers above
the block layer (FS, device mappers etc).

(2) For 512e host-managed devices, always report to the block layer
(device queue) a larger logical block size of 4096B to allow for disk
users to seamlessly adjust to the disk type without having to deal with
the physical sector size. I do not think that this would actually not
require changing the scsi_disk->sector_size field to that incorrect
value so that command addressing does not break. But I wonder if this
may not break a lot of things because of the difference introduced.

(3) Any other idea ?

Best regards.

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
Damien.LeMoal@xxxxxxx
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux