From: Damien Le Moal <damien.lemoal@xxxxxxx> The cmdprio_percentage, cmdprio_class and cmdprio options allow specifying different values for read and write operations. This enables various IO priority issuing patterns even uner a mixed read-write workload but does not allow differentiation within read and write I/O operation types with different sizes when the bssplit option is used. Introduce the cmdprio_bssplit option to complement the use of the bssplit option. This new option has the same format as the bssplit option, but the percentage values indicate the percentage of I/O operations with a particular block size that must be issued with the priority class and value specified by cmdprio_class and cmdprio. Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx> --- HOWTO | 29 ++++++--- engines/cmdprio.h | 113 ++++++++++++++++++++++++++++++++++- engines/io_uring.c | 29 ++++++++- engines/libaio.c | 29 ++++++++- fio.1 | 34 +++++++---- tools/fiograph/fiograph.conf | 4 +- 6 files changed, 210 insertions(+), 28 deletions(-) diff --git a/HOWTO b/HOWTO index 8b7d4957..1853f56a 100644 --- a/HOWTO +++ b/HOWTO @@ -2175,23 +2175,38 @@ with the caveat that when used on the command line, they must come after the .. option:: cmdprio_class=int[,int] : [io_uring] [libaio] Set the I/O priority class to use for I/Os that must be issued with - a priority when :option:`cmdprio_percentage` is set. If not specified - when :option:`cmdprio_percentage` is set, this defaults to the highest - priority class. A single value applies to reads and writes. - Comma-separated values may be specified for reads and writes. See - :manpage:`ionice(1)`. See also the :option:`prioclass` option. + a priority when :option:`cmdprio_percentage` or + :option:`cmdprio_bssplit` is set. If not specified when + :option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set, + this defaults to the highest priority class. A single value applies + to reads and writes. Comma-separated values may be specified for + reads and writes. See :manpage:`ionice(1)`. See also the + :option:`prioclass` option. .. option:: cmdprio=int[,int] : [io_uring] [libaio] Set the I/O priority value to use for I/Os that must be issued with - a priority when :option:`cmdprio_percentage` is set. If not specified - when :option:`cmdprio_percentage` is set, this defaults to 0. + a priority when :option:`cmdprio_percentage` or + :option:`cmdprio_bssplit` is set. If not specified when + :option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set, + this defaults to 0. Linux limits us to a positive value between 0 and 7, with 0 being the highest. A single value applies to reads and writes. Comma-separated values may be specified for reads and writes. See :manpage:`ionice(1)`. Refer to an appropriate manpage for other operating systems since meaning of priority may differ. See also the :option:`prio` option. +.. option:: cmdprio_bssplit=str[,str] : [io_uring] [libaio] + To get a finer control over I/O priority, this option allows + specifying the percentage of IOs that must have a priority set + depending on the block size of the IO. This option is useful only + when used together with the :option:`bssplit` option, that is, + multiple different block sizes are used for reads and writes. + The format for this option is the same as the format of the + :option:`bssplit` option, with the exception that values for + trim IOs are ignored. This option is mutually exclusive with the + :option:`cmdprio_percentage` option. + .. option:: fixedbufs : [io_uring] If fio is asked to do direct IO, then Linux will map pages for each diff --git a/engines/cmdprio.h b/engines/cmdprio.h index e3b42182..8acdb0b3 100644 --- a/engines/cmdprio.h +++ b/engines/cmdprio.h @@ -12,18 +12,106 @@ struct cmdprio { unsigned int percentage[DDIR_RWDIR_CNT]; unsigned int class[DDIR_RWDIR_CNT]; unsigned int level[DDIR_RWDIR_CNT]; + unsigned int bssplit_nr[DDIR_RWDIR_CNT]; + struct bssplit *bssplit[DDIR_RWDIR_CNT]; }; +static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg, + enum fio_ddir ddir, char *str, bool data) +{ + struct cmdprio *cmdprio = cb_arg; + struct split split; + unsigned int i; + + if (ddir == DDIR_TRIM) + return 0; + + memset(&split, 0, sizeof(split)); + + if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX)) + return 1; + if (!split.nr) + return 0; + + cmdprio->bssplit_nr[ddir] = split.nr; + cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit)); + if (!cmdprio->bssplit[ddir]) + return 1; + + for (i = 0; i < split.nr; i++) { + cmdprio->bssplit[ddir][i].bs = split.val1[i]; + if (split.val2[i] == -1U) { + cmdprio->bssplit[ddir][i].perc = 0; + } else { + if (split.val2[i] > 100) + cmdprio->bssplit[ddir][i].perc = 100; + else + cmdprio->bssplit[ddir][i].perc = split.val2[i]; + } + } + + return 0; +} + +static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input, + struct cmdprio *cmdprio) +{ + char *str, *p; + int i, ret = 0; + + p = str = strdup(input); + + strip_blank_front(&str); + strip_blank_end(str); + + ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false); + + if (parse_dryrun()) { + for (i = 0; i < DDIR_RWDIR_CNT; i++) { + free(cmdprio->bssplit[i]); + cmdprio->bssplit[i] = NULL; + cmdprio->bssplit_nr[i] = 0; + } + } + + free(p); + return ret; +} + +static inline int fio_cmdprio_percentage(struct cmdprio *cmdprio, + struct io_u *io_u) +{ + enum fio_ddir ddir = io_u->ddir; + unsigned int p = cmdprio->percentage[ddir]; + int i; + + /* + * If cmdprio_percentage option was specified, then use that + * percentage. Otherwise, use cmdprio_bssplit percentages depending + * on the IO size. + */ + if (p) + return p; + + for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) { + if (cmdprio->bssplit[ddir][i].bs == io_u->buflen) + return cmdprio->bssplit[ddir][i].perc; + } + + return 0; +} + static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio, bool *has_cmdprio) { struct thread_options *to = &td->o; bool has_cmdprio_percentage = false; + bool has_cmdprio_bssplit = false; int i; /* - * If cmdprio_percentage is set and cmdprio_class is not set, - * default to RT priority class. + * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class + * is not set, default to RT priority class. */ for (i = 0; i < DDIR_RWDIR_CNT; i++) { if (cmdprio->percentage[i]) { @@ -31,6 +119,11 @@ static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio, cmdprio->class[i] = IOPRIO_CLASS_RT; has_cmdprio_percentage = true; } + if (cmdprio->bssplit_nr[i]) { + if (!cmdprio->class[i]) + cmdprio->class[i] = IOPRIO_CLASS_RT; + has_cmdprio_bssplit = true; + } } /* @@ -44,8 +137,22 @@ static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio, to->name); return 1; } + if (has_cmdprio_bssplit && + (fio_option_is_set(to, ioprio) || + fio_option_is_set(to, ioprio_class))) { + log_err("%s: cmdprio_bssplit option and mutually exclusive " + "prio or prioclass option is set, exiting\n", + to->name); + return 1; + } + if (has_cmdprio_percentage && has_cmdprio_bssplit) { + log_err("%s: cmdprio_percentage and cmdprio_bssplit options " + "are mutually exclusive\n", + to->name); + return 1; + } - *has_cmdprio = has_cmdprio_percentage; + *has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit; return 0; } diff --git a/engines/io_uring.c b/engines/io_uring.c index 1591ee4e..57124d22 100644 --- a/engines/io_uring.c +++ b/engines/io_uring.c @@ -75,7 +75,7 @@ struct ioring_data { }; struct ioring_options { - void *pad; + struct thread_data *td; unsigned int hipri; struct cmdprio cmdprio; unsigned int fixedbufs; @@ -108,6 +108,15 @@ static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val) return 0; } +static int str_cmdprio_bssplit_cb(void *data, const char *input) +{ + struct ioring_options *o = data; + struct thread_data *td = o->td; + struct cmdprio *cmdprio = &o->cmdprio; + + return fio_cmdprio_bssplit_parse(td, input, cmdprio); +} + static struct fio_option options[] = { { .name = "hipri", @@ -163,6 +172,16 @@ static struct fio_option options[] = { .category = FIO_OPT_C_ENGINE, .group = FIO_OPT_G_IOURING, }, + { + .name = "cmdprio_bssplit", + .lname = "Priority percentage block size split", + .type = FIO_OPT_STR_ULL, + .cb = str_cmdprio_bssplit_cb, + .off1 = offsetof(struct ioring_options, cmdprio.bssplit), + .help = "Set priority percentages for different block sizes", + .category = FIO_OPT_C_ENGINE, + .group = FIO_OPT_G_IOURING, + }, #else { .name = "cmdprio_percentage", @@ -182,6 +201,12 @@ static struct fio_option options[] = { .type = FIO_OPT_UNSUPPORTED, .help = "Your platform does not support I/O priority classes", }, + { + .name = "cmdprio_bssplit", + .lname = "Priority percentage block size split", + .type = FIO_OPT_UNSUPPORTED, + .help = "Your platform does not support I/O priority classes", + }, #endif { .name = "fixedbufs", @@ -432,7 +457,7 @@ static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u) struct io_uring_sqe *sqe = &ld->sqes[io_u->index]; struct cmdprio *cmdprio = &o->cmdprio; enum fio_ddir ddir = io_u->ddir; - unsigned int p = cmdprio->percentage[ddir]; + unsigned int p = fio_cmdprio_percentage(cmdprio, io_u); if (p && rand_between(&td->prio_state, 0, 99) < p) { sqe->ioprio = diff --git a/engines/libaio.c b/engines/libaio.c index 8b965fe2..9fba3b12 100644 --- a/engines/libaio.c +++ b/engines/libaio.c @@ -56,12 +56,21 @@ struct libaio_data { }; struct libaio_options { - void *pad; + struct thread_data *td; unsigned int userspace_reap; struct cmdprio cmdprio; unsigned int nowait; }; +static int str_cmdprio_bssplit_cb(void *data, const char *input) +{ + struct libaio_options *o = data; + struct thread_data *td = o->td; + struct cmdprio *cmdprio = &o->cmdprio; + + return fio_cmdprio_bssplit_parse(td, input, cmdprio); +} + static struct fio_option options[] = { { .name = "userspace_reap", @@ -117,6 +126,16 @@ static struct fio_option options[] = { .category = FIO_OPT_C_ENGINE, .group = FIO_OPT_G_LIBAIO, }, + { + .name = "cmdprio_bssplit", + .lname = "Priority percentage block size split", + .type = FIO_OPT_STR_ULL, + .cb = str_cmdprio_bssplit_cb, + .off1 = offsetof(struct libaio_options, cmdprio.bssplit), + .help = "Set priority percentages for different block sizes", + .category = FIO_OPT_C_ENGINE, + .group = FIO_OPT_G_LIBAIO, + }, #else { .name = "cmdprio_percentage", @@ -136,6 +155,12 @@ static struct fio_option options[] = { .type = FIO_OPT_UNSUPPORTED, .help = "Your platform does not support I/O priority classes", }, + { + .name = "cmdprio_bssplit", + .lname = "Priority percentage block size split", + .type = FIO_OPT_UNSUPPORTED, + .help = "Your platform does not support I/O priority classes", + }, #endif { .name = "nowait", @@ -185,7 +210,7 @@ static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u) struct libaio_options *o = td->eo; struct cmdprio *cmdprio = &o->cmdprio; enum fio_ddir ddir = io_u->ddir; - unsigned int p = cmdprio->percentage[ddir]; + unsigned int p = fio_cmdprio_percentage(cmdprio, io_u); if (p && rand_between(&td->prio_state, 0, 99) < p) { io_u->iocb.aio_reqprio = diff --git a/fio.1 b/fio.1 index 09b97de3..415a91bb 100644 --- a/fio.1 +++ b/fio.1 @@ -1972,21 +1972,31 @@ used. fio must also be run as the root user. .TP .BI (io_uring,libaio)cmdprio_class \fR=\fPint[,int] Set the I/O priority class to use for I/Os that must be issued with a -priority when \fBcmdprio_percentage\fR is set. If not specified when -\fBcmdprio_percentage\fR is set, this defaults to the highest priority -class. A single value applies to reads and writes. Comma-separated -values may be specified for reads and writes. See man \fBionice\fR\|(1). -See also the \fBprioclass\fR option. +priority when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR is set. +If not specified when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR +is set, this defaults to the highest priority class. A single value applies +to reads and writes. Comma-separated values may be specified for reads and +writes. See man \fBionice\fR\|(1). See also the \fBprioclass\fR option. .TP .BI (io_uring,libaio)cmdprio \fR=\fPint[,int] Set the I/O priority value to use for I/Os that must be issued with a -priority when \fBcmdprio_percentage\fR is set. If not specified when -\fBcmdprio_percentage\fR is set, this defaults to 0. Linux limits us to -a positive value between 0 and 7, with 0 being the highest. A single -value applies to reads and writes. Comma-separated values may be specified -for reads and writes. See man \fBionice\fR\|(1). Refer to an appropriate -manpage for other operating systems since the meaning of priority may differ. -See also the \fBprio\fR option. +priority when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR is set. +If not specified when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR +is set, this defaults to 0. Linux limits us to a positive value between +0 and 7, with 0 being the highest. A single value applies to reads and writes. +Comma-separated values may be specified for reads and writes. See man +\fBionice\fR\|(1). Refer to an appropriate manpage for other operating systems +since the meaning of priority may differ. See also the \fBprio\fR option. +.TP +.BI (io_uring,libaio)cmdprio_bssplit \fR=\fPstr[,str] +To get a finer control over I/O priority, this option allows specifying +the percentage of IOs that must have a priority set depending on the block +size of the IO. This option is useful only when used together with the option +\fBbssplit\fR, that is, multiple different block sizes are used for reads and +writes. The format for this option is the same as the format of the +\fBbssplit\fR option, with the exception that values for trim IOs are +ignored. This option is mutually exclusive with the \fBcmdprio_percentage\fR +option. .TP .BI (io_uring)fixedbufs If fio is asked to do direct IO, then Linux will map pages for each IO call, and diff --git a/tools/fiograph/fiograph.conf b/tools/fiograph/fiograph.conf index 5ba59c52..cfd2fd8e 100644 --- a/tools/fiograph/fiograph.conf +++ b/tools/fiograph/fiograph.conf @@ -51,10 +51,10 @@ specific_options=https http_host http_user http_pass http_s3_key http_s3_ke specific_options=ime_psync ime_psyncv [ioengine_io_uring] -specific_options=hipri cmdprio_percentage cmdprio_class cmdprio fixedbufs registerfiles sqthread_poll sqthread_poll_cpu nonvectored uncached nowait force_async +specific_options=hipri cmdprio_percentage cmdprio_class cmdprio cmdprio_bssplit fixedbufs registerfiles sqthread_poll sqthread_poll_cpu nonvectored uncached nowait force_async [ioengine_libaio] -specific_options=userspace_reap cmdprio_percentage cmdprio_class cmdprio nowait +specific_options=userspace_reap cmdprio_percentage cmdprio_class cmdprio cmdprio_bssplit nowait [ioengine_libcufile] specific_options=gpu_dev_ids cuda_io -- 2.31.1