https://bugzilla.kernel.org/show_bug.cgi?id=202127 --- Comment #17 from Dave Chinner (david@xxxxxxxxxxxxx) --- On Fri, Jan 04, 2019 at 10:02:58PM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202127 > > --- Comment #16 from Eric Sandeen (sandeen@xxxxxxxxxxx) --- > Broadcom is simply wrong - ext4 doesn't care about or use the stripe geometry > like xfs does. But that is beside the point, because: > > It is not valid to have a preferred I/O size smaller than the minimum I/O > size > - this should be obvious to the vendor. We can detect this at mkfs time and > error out or ignore the bad values, but there can be no debate about whether > the hardware is returning nonsense. It /is/ returning nonsense. That's a > firmware bug. > > Pose the question to broadcom: > > "How can the preferred IO size be less than the minimum allowable IO size? Just to clarify, the "minio" being reported here is not the "minimum allowable IO size". The minimum allowed IO size is the logical sector/block size of the device. "minimum_io_size" is badly named - it's actually the smallest IO size alignment that allows for /efficient/ IO operations to be performed by the device, and that's typically very different to logical_block_size of the device. e.g: $ cat /sys/block/sdf/queue/hw_sector_size 512 $ cat /sys/block/sdf/queue/logical_block_size 512 $ cat /sys/block/sdf/queue/physical_block_size 4096 $ cat /sys/block/sdf/queue/minimum_io_size 4096 $ So, we can do 512 byte sector IOs to this device, but it's not efficient due to it having a physical 4k block size. i.e. ti requires a RMW cycle to do a 512 byte write. IOWs, a 4k IO (minimum_io_size) will avoid physical block RMW cycles as the physical block size of the storage is 4k. That's what "minimum efficient IO size" means. For a RAID5/6 lun, this is typically the chunk size, as many RAID implementations can do single chunk aligned writes efficiently via partial stripe recalculation without needing RMW cycles. If the write partially overlaps chunks, then RMW cycles are required for RAID recalc, hence setting the RAID chunk size as the "minimum_io_size" makes sense. However, a device may not be efficient and reach saturation when fed lots of minimum_io_size requests. That's where optimal_io_size comes in - a lot of SSDs out there have an optimal IO size in the range of 128-256KB because they can't reach max throughput when smaller IO sizes are used (iops bound). i.e. the optimal IO size is the size of the Io that will allow the entire bandwidth of the device to be effectively utilised. For a RAID5/6 lun, the optimal IO size is the one that keeps all disk heads moving sequentially and in synchronisation and doesn't require partial stripe writes (and hence RMW cycles) to occur. IOWs, its the IO alignment and size that will allow full stripe writes to be sent to the underlying device. By definition, the optimal_io_size is /always/ >= minimum_io_size. If the optimal_io_size is < minimum_io_size, then one of them is incorrectly specified. The only time this does not hold true is when the device does not set a optimal_io_size, in which case it should be zero and then gets ignored by userspace. Regardless, what still stands here is that the firmware needs fixing and that is only something broadcom can fix. Cheers, Dave. -- You are receiving this mail because: You are watching the assignee of the bug.