On Wed, Nov 10, 2021 at 02:57:17PM +0900, Damien Le Moal wrote: > On 2021/11/09 22:29, Niklas Cassel wrote: > >>> -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio, > >>> - bool *has_cmdprio) > >>> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio) > >>> { > >>> struct thread_options *to = &td->o; > >>> bool has_cmdprio_percentage = false; > >>> @@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio, > >>> * is not set, default to RT priority class. > >>> */ > >>> for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) { > >>> - if (cmdprio->percentage[i]) { > >>> - if (!cmdprio->class[i]) > >>> - cmdprio->class[i] = IOPRIO_CLASS_RT; > >> > >> Why do you change this ? If cmdprio->percentage[i] is 0, you do not want to set > >> the calss to RT. Or is it that cmdprio->percentage[i] cannot be 0 ? (I did not > >> check the range of the option). > > > > option cmdprio_percentage has minval 0. > > Which could be changed to 1, since having percentage == 0 is (should be) > equivalent to a "do not set a priority". We could change the option .minval, but it wouldn't change anything at all. Since the option is an FIO_OPT_INT, the off1 value will be 0 if the user didn't specify anything, or if the user specified 0. Either way, the code has to handle 0, so I propose that we leave it as is. > > > option cmdprio_class has minval 1, however it can still be 0 if the user > > omitted the option and only used option cmdprio to specify a priority level. > > Hmmm... class 0 is "NONE", so no priority. In the kernel, that will be changed > to the default BE class. So maybe we should do the same here to avoid this weird > case ? (setting a prio level with class NONE does not make any sense). For os/os-linux.h in fio, this is how ioprio_value() looks: static inline int ioprio_value(int ioprio_class, int ioprio) { /* * If no class is set, assume BE */ if (!ioprio_class) ioprio_class = IOPRIO_CLASS_BE; return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio; } So it will already use BE class when you don't specify a class. However, since the man page says that if you omit a class for cmdprio_percentage/cmdprio_bssplit, the highest priority class (RT) should be used. So for cmdprio_percentage/cmdprio_bssplit, ioprio_value() will therefore never get in class==0. (cmdprio has already changed it to RT.) Therefore, the ioprio_value() function will only change class to BE when using option prio, without specifying option prioclass. Either case, since ioprio_value() is used, for prio/prioclass or an IO overloaded with a cmdprio priority value, fio will never send class==0 to the kernel, so I think that what you are suggesting is already there. > > > > > > I was thinking that cmdprio_value is only used when the percentage != 0, > > which is the case in my cmdprio branch that allows you to have different > > cmdprio class + levels, but you are right, in this specific patch > > fio_cmdprio_ioprio_was_updated()/fio_cmdprio_set_ioprio() still uses > > cmdprio->class to determine if IO_U_F_HIGH_PRIO flag should be set, > > even when fio_cmdprio_percentage() returned zero.. > > > > The question is, if percentage == 0, for e.g. writes, > > but the user specified cmdprio=3 > > (which sets cmdprio->level[] for both DDIR read and write), > > should we set the HIGH_PRIO / LOW_PRIO flag? > > That is a non question... If percentage == 0, fio_cmdprio_set_ioprio() should > NOT try to change the io_u prio at all. Then the io_u HIGH/LOW flag will depend > on the default priority. That is the same as for the 0 < percentage < 100 case > when an io_u does not get a "hit" on the percentage and fio_cmdprio_set_ioprio() > does not change its priority. With percentage == 0, you will never get a "hit". I agree, as that is exactly what I wrote in the next sentence :) Here: > > I think the best way is to simply not set any of the flags, > > If cmdprio percentage is 0 for a certain ddir, then everything > > will be sent with the default prio. > > For libaio (io_uring is similar), the code in fio_libaio_prio_prep() is: > > if (p && rand_between(&td->prio_state, 0, 99) < p) { > io_u->ioprio = cmdprio_value; > io_u->iocb.aio_reqprio = cmdprio_value; > io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO; > if (!td->ioprio || cmdprio_value < td->ioprio) { > /* > * The async IO priority is higher (has a lower value) > * than the default context priority. > */ > io_u->flags |= IO_U_F_HIGH_PRIO; > } > } else if (td->ioprio && td->ioprio < cmdprio_value) { > /* > * The IO will be executed with the default context priority, > * and this priority is higher (has a lower value) than the > * async IO priority. > */ > io_u->flags |= IO_U_F_HIGH_PRIO; > } > > So if percentage == 0, p is 0 and the io_u flag is set according to the default > prio. What you are suggesting is already there and should not change. Agreed, and this is how I decided to do things in my V2 series, which is already on the fio mailing list and on your opensource.wdc.com email. Please review! :) What I don't like with the existing behavior is that even when fio_cmdprio_percentage() returns zero (p == 0), you will still perform } else if (td->ioprio && td->ioprio < cmdprio_value) { and potentially set the HIGH_PRIO flag. E.g. if you have a bssplit with both reads and writes, and cmdprio_bssplit for read DDIR has entries with non-zero values, but write DDIR only has entries with zero values. Then for write DDIR, you could still set the HIGH_PRIO flag.. even though fio_cmdprio_percentage() returns 0 for all blocksizes for write DDIR.. yucky.. Anyway, since stat.c only prints something if there is both HIGH and LOW entries, this shouldn't produce incorrect output. Anyway, I kept this existing (albeit a bit ugly, but working) behavior in my V2, but in the upcoming series, where we have a clat_prio array, we will only ever touch stuff, flags whatever, if p != 0. Kind regards, Niklas