>>>>> "Neil" == Neil Brown <neilb@xxxxxxx> writes: >> What: /sys/block/<disk>/queue/minimum_io_size Date: April 2009 >> Contact: Martin K. Petersen <martin.petersen@xxxxxxxxxx> Description: >> Storage devices may report a granularity or minimum I/O size which is >> the device's preferred unit of I/O. Requests smaller than this may >> incur a significant performance penalty. >> >> For disk drives this value corresponds to the physical block >> size. For RAID devices it is usually the stripe chunk size. Neil> These two paragraphs are contradictory. There is no sense in Neil> which a RAID chunk size is a preferred minimum I/O size. Maybe not for MD. This is not just about MD. This is a hint that says "Please don't send me random I/Os smaller than this. And please align to a multiple of this value". I agree that for MD devices the alignment portion of that is the important one. However, putting a lower boundary on the size *is* quite important for 4KB disk drives. There are also HW RAID devices that choke on requests smaller than the chunk size. I appreciate the difficulty in filling out these hints in a way that makes sense for all the supported RAID levels in MD. However, I really don't consider the hints particularly interesting in the isolated context of MD. To me the hints are conduits for characteristics of the physical storage. The question you should be asking yourself is: "What do I put in these fields to help the filesystem so that we get the most out of the underlying, slow hardware?". I think it is futile to keep spending time coming up with terminology that encompasses all current and future software and hardware storage devices with 100% accuracy. Neil> To some degree it is actually a 'maximum' preferred size for Neil> random IO. If you do random IO is blocks larger than the chunk Neil> size then you risk causing more 'head contention' (at least with Neil> RAID0 - with RAID5 the tradeoff is more complex). Please elaborate. Neil> Also, you say "may" report. If a device does not report, what Neil> happens to this file. Is it not present, or empty, or contain a Neil> special "undefined" value? I think the answer is that "512" is Neil> reported. The answer is physical_block_size. Neil> In this case, if a device does not report an optimal size, the Neil> file contains "0" - correct? Should that be explicit? Now documented. Neil> I'd really like to see an example of how you expect filesystems to Neil> use this. I can well imagine the VM or elevator using this to Neil> assemble IO requests in to properly aligned requests. But I Neil> cannot imagine how e.g mkfs would use it. Or am I Neil> misunderstanding and this is for programs that use O_DIRECT on the Neil> block device so they can optimise their request stream? The way it has been working so far (with the manual ioctl pokage) is that mkfs will align metadata as well as data on a minimum_io_size boundary. And it will try to use the minimum_io_size as filesystem block size. On Linux that's currently limited by the fact that we can't have blocks bigger than a page. The filesystem can also report the optimal I/O size in statfs. For XFS the stripe width also affects how the realtime/GRIO allocators work. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html