Linus, On 2018/10/03 17:28, Linus Walleij wrote: > On Wed, Oct 3, 2018 at 9:42 AM Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote: > >> There is another class of outliers: host-managed SMR disks (SATA and SCSI, >> definitely single hw queue). For these, using mq-deadline is mandatory in many >> cases in order to guarantee sequential write command delivery to the device >> driver. Having the default changed to bfq, which as far as I know is not SMR >> friendly (can sequential writes within a single zone be reordered ?) is asking >> for troubles (unaligned write errors showing up). > > Ah, that is interesting. > > Which device driver files are we talking about here, specifically? > I'd like to take a look. Currently, sd.c (SCSI disk) as well as null_blk can expose host-managed zoned block devices. > I guess what you say is not that you are looking for the deadline > scheduling per se (as in deadline scheduling is nice), what you want is > the zone locking semantics in that scheduler, is that right? Yes, correct. The scheduling policy in itself does not really matter, but should not deviate from the mandatory HM write policy: "within a sequential write required zone, writes must be issued sequentially". That could somewhat impacts the scheduler code itself if said scheduler think that not dispatching sequential writes in sequence is a good idea :) No sane scheduler would that though (at least on HDDs) so the impact on the scheduler code itself is reduced. > I.e. this business: > blk_queue_is_zoned(q) > blk_req_zone_write_lock(rq); > blk_req_zone_write_unlock(rq); > and mq-deadline solves this with a spinlock. Yes. These are the helper functions handling the zone write locking to simplify the task of the scheduler to limit the number of in-flight write request to one per zone at most at any time. This is the trick to avoid write reordering stack-wide, since most of the time, reordering happens not because of the scheduler itself, but the blk-mq (or legacy path) around it (e.g. requeue due to resource shortage, multiple contexts running the queues, etc). > I will augment the patch to enforce mq-deadline > if blk_queue_is_zoned(q) is true, as it is clear that > any device with that characteristic must use mq-deadline. > > Paoly might be interested in looking into whether BFQ could > also handle zoned devices in the future, I have no idea of how > hard that would be. It was rather easy with deadline, but the scheduler code was simple to start with. Basically, the only thing needed is on dispatch to skip any write request to a zone that is already locked (i.e. a write is already ongoing). For reads, there are no constraints so nothing needs to be changed. Zone unlocking must be done on completion of the write request so the scheduler completion method needs to change a little too. > The zoned business seems a bit fragile. Should it even be > allowed to select any other scheduler than deadline on these > devices? Presenting all compiled in schedulers in > /sysblock/device/queue/scheduler sounds like just giving > sysadmins too much rope. Yes, that is debatable. But the above "one write per zone" trick can be also handled by an application, which makes using any other scheduler OK. Look at the recent SMR support code in fio from Bart. Some of the tests (in t/zbd) for some I/O patterns are just fine with any scheduler. In fact any pattern is OK because fio is SMR aware and never issues more than one write per zone. That is however true only and only for I/O sizes that are small enough as to not cause the kernel to generate multiple BIOs for each I/O call. Otherwise, deadline & mq-deadline become necessary. But I agree, it is a little fragile. Application developer and sysadmins really need to know what will be running on the disk to make the right choice. And knowing that is not necessarily straightforward. Best regards. -- Damien Le Moal Western Digital Research