On 5/31/18 00:45, Jeff Moyer wrote: > Jens Axboe <axboe@xxxxxxxxx> writes: >> But what's the regression? 4.15 had no zone write locking at all. > > The zone write locking was done in the sd driver prior to 4.16. See > commit 39051dd85f287 ("scsi: sd: Remove zone write locking") for where > it was removed. That means these devices "just worked" with all I/O > schedulers. Yes they did "just work", but that was not an ideal solution either because of the performance implications: sequential writes to a single zone where stalling the dispatch queue waiting for the dispatched write to the locked zone to complete. That was not optimal at all (sure, the drive side write caching was hiding this a bit, but still) >>> Moving on, assuming your mind is made up... >>> >>> I'm not sure how much logic should go into the udev rule. As mentioned, >>> this limitation was introduced in 4.16, and Damien has plans to lift the >>> restriction in future kernels. Because distributions tend to cherry >>> pick changes, making decisions on whether a feature exists based solely >>> on kernel version is usually not a great thing. My inclination would be >>> to just always force deadline for host-managed SMR drives. These drives >>> aren't that popular, after all. Any opinions on this? >> >> The problem is that it's tied to an IO scheduler, which ends up causing >> issues like this, since users are free to select a different scheduler. >> Then things break. Granted, in this case, some extraordinarily shitty >> hardware even broke. That is on the hardware, not the kernel, that >> kind of breakage should not occur. > > If the firmware problem was widespread, I think we'd try to avoid it. I > have no reason to believe that is the case, though. First time in my career that I heard of a disk breaking a system BIOS. I will notify our system test lab to investigate this. Jeff, let's take this discussion off-list since that is not kernel related. > Damien made the argument that the user should be able to select an I/O > scheduler that doesn't perform the write locking, because a well-behaved > application could theoretically make use of it. I think this is a weak > argument, given that dm-zoned doesn't even support such a mode. Yes, a little week. That is definitely not the main use case I am seeing with customers. That said, these drives are starting to be used with other feature sets being enabled like I/O priorities. Considering there size, this is a very interesting feature to control access latency. Deadline and mq-deadline will not act on I/O priorities, cfq (and bfq ?) will. Potentially better results can be achieved, but as these schedulers do not support zone write locking, the application needs to be careful with its per-zone write queue depth and doing so avoid tripping over kernel level command unintended reordering. I see this as a valid enough use case to not "lock-down" the scheduler to deadline only and allow others schedulers too. Yet, deadline should be the default until an application asks for something else if needed. dm-zoned or f2fs (btrfs in the lab too) "assume" that the underlying stack does the right thing. That is of course true (for now) only if the deadline scheduler is enabled. A sane default set on device initialization would be nice to have and avoid potential headaches with rule ordering with regard to components initialization (not to mention that this would make booting from these disks possible). > I definitely see this udev rule as a temporary workaround. I agree. In fact I see the deadline based zone write locking itself as a temporary workaround. For now, I do not see any other clean method that covers both mq and legacy path. Considering only mq, we discussed interesting possibilities at LSFMM using dedicated write queues. That could be handled generically and remove the dependency on the scheduler while also increasing coverage of the support to open channel SSDs as well. My guess is that no major change for this write locking will happen on the legacy path, which hopefully will go away soon (?). But there are options forward with blk-mq. >> So now we're stuck with this temporary situation which needs a work-around. >> I don't think it's a terrible idea to have a rule that just sets >> deadline/mq-deadline for an SMR device regardless of what kernel it is >> running on. It'll probably never be a bad default. I agree. But since there are other kernel components (dm-zoned, FSes and the entire fs/block-dev.c direct I/O write path) depending on the scheduler to be set to something sane, setting that early on in the device initialization before it is grabbed by an FS or a device mapper would definitely be nice to have. > > OK. Barring future input to the contrary, I'll work to get updates into > fedora, at least. I've CC'd Colin and Hannes. I'm not sure who else to > include. > > FYI, below is the udev rule Damien had provided to Bryan. I'm not sure > about the KERNEL=="sd[a-z]" bit, that may need modification. Note: I'm > no udev expert. It probably needs to be something like KERNEL=="sd*" to allow more than 26 drives. Best regards. > > Cheers, > Jeff > > ACTION=="add|change", KERNEL=="sd[a-z]", > ATTRS{queue/zoned}=="host-managed", ATTR{queue/scheduler}="deadline" > -- Damien Le Moal, Western Digital