On Thu 02-06-22 10:53:29, Damien Le Moal wrote: > On 2022/06/02 1:04, Jan Kara wrote: > > On Thu 02-06-22 00:08:28, Damien Le Moal wrote: > >> The different ioprio leading to no BIO merging is definitely a problem > >> but this patch is not really fixing anything in my opinion. It simply > >> gets back to the previous "all 0s" ioprio initialization, which do not > >> show the places where we actually have missing ioprio initialization to > >> IOPRIO_DEFAULT. > > > > So I agree we should settle on how to treat IOs with unset IO priority. The > > behavior to use task's CPU priority when IO priority is unset is there for > > a *long* time and so I think we should preserve that. The question is where > > in the stack should the switch from "unset" value to "effective ioprio" > > happen. Switching in IO context is IMO too early since userspace needs to > > be able to differentiate "unset" from "set to > > IOPRIO_CLASS_BE,IOPRIO_BE_NORM". But we could have a helper like > > current_effective_ioprio() that will do the magic with mangling unset IO > > priority based on task's CPU priority. > > I agree that the task's CPU priority is a more sensible default. However, > I do not understand your point about the IO context being too early to > set the effective priority. If we do not do that, getting the issuer CPU > priority will not be easily possible, right ? I just meant that in the IO context we need to keep the information whether IO priority is set to a particular value, or whether it is set to 0 meaning "inherit from CPU priority". So we cannot just store the effective IO priority immediately in the IO context instead of 0. > > The fact is that bio->bi_ioprio gets set to anything only for direct IO in > > iomap_dio_rw(). The rest of the IO has priority unset (BFQ fetches the > > priority from task's IO context and ignores priority on bios BTW). And the > > only place where req->ioprio (inherited from bio->bi_ioprio) gets used is > > in a few drivers to set HIGHPRI flag for IOPRIO_CLASS_RT IO and then the > > relatively new code in mq-deadline. > > Yes, only IOPRIO_CLASS_RT has an effect at the hardware level right now. > Everything else is only for the IO scheduler to play with. I am preparing a > patch series to support scsi/ata command duration limits though. That adds a new > IOPRIO_CLASS_DL and that one will also have an effect on the hardware. > > Note that if a process/thread use ioprio_set(), we should also honor the ioprio > set for buffered reads, at least those page IOs that are issued directly from > the IO issuer context (which may include readahead... hmmm). Yes, that would make sense as well. > >> Isn't it simply that IOPRIO_DEFAULT should be set as the default for any bio > >> being allocated (in bio_alloc ?) before it is setup and inherits the user io > >> priority ? Otherwise, the bio io prio is indeed IOPRIO_CLASS_NONE/0 and changing > >> IOPRIO_DEFAULT to that value removes the differences you observed. > > > > Yes, I think that would make sence although as I explain above this is > > somewhat independent to what the default IO priority behavior should be. > I am OK with the use of the task CPU priority/ionice value as the default when > no other ioprio is set for a bio using the user aio_reqprio or ioprio_set(). If > this relies on task_nice_ioclass() as it is today (I see no reason to change > that), then the default class for regular tasks remains IOPRIO_CLASS_BE as is > defined by IOPRIO_DEFAULT. Yes, good. > But to avoid the performance regression you observed, we really need to be 100% > sure that all bios have their ->bi_ioprio field correctly initialized. Something > like: > > void bio_set_effective_ioprio(struct *bio) > { > switch (IOPRIO_PRIO_CLASS(bio->bi_ioprio)) { > case IOPRIO_CLASS_RT: > case IOPRIO_CLASS_BE: > case IOPRIO_CLASS_IDLE: > /* > * the bio ioprio was already set from an aio kiocb ioprio > * (aio->aio_reqprio) or from the issuer context ioprio if that > * context used ioprio_set(). > */; > return; > case IOPRIO_CLASS_NONE: > default: > /* Use the current task CPU priority */ > bio->ioprio = > IOPRIO_PRIO_VALUE(task_nice_ioclass(current), > task_nice_ioprio(current)); > return; > } > } > > being called before a bio is inserted in a scheduler or bypass inserted in the > dispatch queues should result in all BIOs having an ioprio that is set to > something other than IOPRIO_CLASS_NONE. And the obvious place may be simply at > the beginning of submit_bio(), before submit_bio_noacct() is called. > > I am tempted to argue that block device drivers should never see any req > with an ioprio set to IOPRIO_CLASS_NONE, which means that no bio should > ever enter the block stack with that ioprio either. With the above > solution, bios from DM targets submitted with submit_bio_noacct() could > still have IOPRIO_CLASS_NONE... So would submit_bio_noacct() be the > better place to call the effective ioprio helper ? Yes, I also think it would be the cleanest if we made sure bio->ioprio is always set to some value other than IOPRIO_CLASS_NONE. I'll see how we can make that happen in the least painful way :). Thanks for your input! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR