On 5/31/18 00:52, Jens Axboe wrote: > On 5/30/18 9:45 AM, Jeff Moyer wrote: >> Jens Axboe <axboe@xxxxxxxxx> writes: >> >>> On 5/30/18 9:06 AM, Jeff Moyer wrote: >>>> Hi, Jens, >>>> >>>> Jens Axboe <axboe@xxxxxxxxx> writes: >>>> >>>>> On 5/30/18 2:49 AM, Christoph Hellwig wrote: >>>>>> While I really don't want drivers to change the I/O schedule themselves >>>>>> we have a class of devices (zoned) that don't work at all with certain >>>>>> I/O schedulers. The kernel not chosing something sane and requiring >>>>>> user workarounds is just silly. >>>>> >>>>> They work just fine for probing and reading purposes. There's absolutely >>>>> no reason why we can't handle these special snowflakes with a udev rule. >>>> >>>> udev rules aren't shipped with the kernel, so it makes it hard to keep >>>> them in sync. In this instance, I'm not sure anyone made an effort to >>>> notify distributions that a udev rule was even necessary. (Is there a >>>> way of notifying distributions about kernel changes that require new >>>> udev rules, other than emailing each list individually?) >>>> >>>> From a technical standpoint, I totally agree with you, Jens. However, I >>>> think the user experience sucks. 4.15 worked by default, 4.16 doesn't. >>>> The result will be bug reports from users (to the drive vendors, >>>> distribution bugzillas, here, etc.). >>> >>> I would imagine that most folks get their updates from a distro of some >>> sort, in which case there's absolutely nothing stopping the distro from >>> shipping updated rules for the 4.16 kernel update. >> >> The problem is distros have already shipped that kernel. :) > > Ship an update, then! I'm sure that most people would prefer a simple > rule update over a kernel update. And you have to do one of them, to > resolve this anyway. > >>> But what's the regression? 4.15 had no zone write locking at all. >> >> The zone write locking was done in the sd driver prior to 4.16. See >> commit 39051dd85f287 ("scsi: sd: Remove zone write locking") for where >> it was removed. That means these devices "just worked" with all I/O >> schedulers. > > Gotcha, makes sense. > >>>> Moving on, assuming your mind is made up... >>>> >>>> I'm not sure how much logic should go into the udev rule. As mentioned, >>>> this limitation was introduced in 4.16, and Damien has plans to lift the >>>> restriction in future kernels. Because distributions tend to cherry >>>> pick changes, making decisions on whether a feature exists based solely >>>> on kernel version is usually not a great thing. My inclination would be >>>> to just always force deadline for host-managed SMR drives. These drives >>>> aren't that popular, after all. Any opinions on this? >>> >>> The problem is that it's tied to an IO scheduler, which ends up causing >>> issues like this, since users are free to select a different scheduler. >>> Then things break. Granted, in this case, some extraordinarily shitty >>> hardware even broke. That is on the hardware, not the kernel, that >>> kind of breakage should not occur. >> >> If the firmware problem was widespread, I think we'd try to avoid it. I >> have no reason to believe that is the case, though. >> >> Damien made the argument that the user should be able to select an I/O >> scheduler that doesn't perform the write locking, because a well-behaved >> application could theoretically make use of it. I think this is a weak >> argument, given that dm-zoned doesn't even support such a mode. > > Sure, the user should be able to select whatever they want. Maybe they > are strictly using it through bsg or a similar interface, in which > case no scheduler or kernel support is really neeeded to drive it. dm-zoned, f2fs (more FSes coming) use the drives through the regular block/bio stack. The scheduler is involved and needs to be correctly set. For applications not relying on these components and doing raw disk I/Os, I see 2 camps in the field: the sg/bsg camp and the regular POSIX system calls camp. For the former, since the kernel will have minimal interaction with the commands, the application is on its own. Control over write ordering has to be coded in. But for the latter case, which is orders of magnitudes easier to use, the scheduler needs to be set correctly or the same pressure is on the application to "do the right thing". This conflicts with the benefits of this access path choice (simplicity). In the end, if the drives are used directly from applications, I think it is OK to only expect a correct system setting if deadline is required. So udev is fine. But for the kernel components like dm-zoned, a sane default being set from the start is my preferred choice. Best regards. -- Damien Le Moal, Western Digital