Re: [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 10, 2021 at 02:57:17PM +0900, Damien Le Moal wrote:
> On 2021/11/09 22:29, Niklas Cassel wrote:
> >>> -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> >>> -		     bool *has_cmdprio)
> >>> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
> >>>  {
> >>>	struct thread_options *to = &td->o;
> >>>	bool has_cmdprio_percentage = false;
> >>> @@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> >>>	 * is not set, default to RT priority class.
> >>>	 */
> >>>	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
> >>> -		if (cmdprio->percentage[i]) {
> >>> -			if (!cmdprio->class[i])
> >>> -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> >>
> >> Why do you change this ? If cmdprio->percentage[i] is 0, you do not want to set
> >> the calss to RT. Or is it that cmdprio->percentage[i] cannot be 0 ? (I did not
> >> check the range of the option).
> >
> > option cmdprio_percentage has minval 0.
>
> Which could be changed to 1, since having percentage == 0 is (should be)
> equivalent to a "do not set a priority".

We could change the option .minval, but it wouldn't change anything at all.

Since the option is an FIO_OPT_INT, the off1 value will be 0 if the user
didn't specify anything, or if the user specified 0.

Either way, the code has to handle 0, so I propose that we leave it as is.


>
> > option cmdprio_class has minval 1, however it can still be 0 if the user
> > omitted the option and only used option cmdprio to specify a priority level.
>
> Hmmm... class 0 is "NONE", so no priority. In the kernel, that will be changed
> to the default BE class. So maybe we should do the same here to avoid this weird
> case ? (setting a prio level with class NONE does not make any sense).

For os/os-linux.h in fio, this is how ioprio_value() looks:

static inline int ioprio_value(int ioprio_class, int ioprio)
{
	/*
	 * If no class is set, assume BE
	 */
	if (!ioprio_class)
		ioprio_class = IOPRIO_CLASS_BE;

	return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio;
}

So it will already use BE class when you don't specify a class.

However, since the man page says that if you omit a class for
cmdprio_percentage/cmdprio_bssplit, the highest priority class (RT)
should be used. So for cmdprio_percentage/cmdprio_bssplit, ioprio_value()
will therefore never get in class==0. (cmdprio has already changed it to RT.)
Therefore, the ioprio_value() function will only change class to BE when
using option prio, without specifying option prioclass.

Either case, since ioprio_value() is used, for prio/prioclass or an IO
overloaded with a cmdprio priority value, fio will never send class==0
to the kernel, so I think that what you are suggesting is already there.


>
>
> >
> > I was thinking that cmdprio_value is only used when the percentage != 0,
> > which is the case in my cmdprio branch that allows you to have different
> > cmdprio class + levels, but you are right, in this specific patch
> > fio_cmdprio_ioprio_was_updated()/fio_cmdprio_set_ioprio() still uses
> > cmdprio->class to determine if IO_U_F_HIGH_PRIO flag should be set,
> > even when fio_cmdprio_percentage() returned zero..
> >
> > The question is, if percentage == 0, for e.g. writes,
> > but the user specified cmdprio=3
> > (which sets cmdprio->level[] for both DDIR read and write),
> > should we set the HIGH_PRIO / LOW_PRIO flag?
>
> That is a non question... If percentage == 0, fio_cmdprio_set_ioprio() should
> NOT try to change the io_u prio at all. Then the io_u HIGH/LOW flag will depend
> on the default priority. That is the same as for the 0 < percentage < 100 case
> when an io_u does not get a "hit" on the percentage and fio_cmdprio_set_ioprio()
> does not change its priority. With percentage == 0, you will never get a "hit".

I agree, as that is exactly what I wrote in the next sentence :)
Here:

> > I think the best way is to simply not set any of the flags,
> > If cmdprio percentage is 0 for a certain ddir, then everything
> > will be sent with the default prio.


>
> For libaio (io_uring is similar), the code in fio_libaio_prio_prep() is:
>
>         if (p && rand_between(&td->prio_state, 0, 99) < p) {
>                 io_u->ioprio = cmdprio_value;
>                 io_u->iocb.aio_reqprio = cmdprio_value;
>                 io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
>                 if (!td->ioprio || cmdprio_value < td->ioprio) {
>                         /*
>                          * The async IO priority is higher (has a lower value)
>                          * than the default context priority.
>                          */
>                         io_u->flags |= IO_U_F_HIGH_PRIO;
>                 }
>         } else if (td->ioprio && td->ioprio < cmdprio_value) {
>                 /*
>                  * The IO will be executed with the default context priority,
>                  * and this priority is higher (has a lower value) than the
>                  * async IO priority.
>                  */
>                 io_u->flags |= IO_U_F_HIGH_PRIO;
>         }
>
> So if percentage == 0, p is 0 and the io_u flag is set according to the default
> prio. What you are suggesting is already there and should not change.

Agreed, and this is how I decided to do things in my V2 series,
which is already on the fio mailing list and on your opensource.wdc.com email.
Please review! :)


What I don't like with the existing behavior is that even when
fio_cmdprio_percentage() returns zero (p == 0), you will still
perform } else if (td->ioprio && td->ioprio < cmdprio_value) {
and potentially set the HIGH_PRIO flag.

E.g. if you have a bssplit with both reads and writes, and
cmdprio_bssplit for read DDIR has entries with non-zero values,
but write DDIR only has entries with zero values.
Then for write DDIR, you could still set the HIGH_PRIO flag..
even though fio_cmdprio_percentage() returns 0 for all blocksizes
for write DDIR.. yucky..

Anyway, since stat.c only prints something if there is both HIGH
and LOW entries, this shouldn't produce incorrect output.

Anyway, I kept this existing (albeit a bit ugly, but working)
behavior in my V2, but in the upcoming series, where we have a
clat_prio array, we will only ever touch stuff, flags whatever,
if p != 0.


Kind regards,
Niklas



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux