On 03/10/22 10:54, Yury Norov wrote: > On Mon, Oct 03, 2022 at 04:34:16PM +0100, Valentin Schneider wrote: >> A recent commit made cpumask_next*() trigger a warning when passed >> n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU >> numbers back into cpumask_next*(). >> >> The warning occurs nearly every boot on QEMU: > > [...] > >> Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range") > > No! It fixes blk-mq bug, which has been revealed after 78e5a3399421. > >> Suggested-by: Yury Norov <yury.norov@xxxxxxxxx> > > OK, maybe I suggested something like this. But after looking into the code > of blk_mq_hctx_next_cpu() code for more, I have a feeling that this should > be overridden deeper. > > Can you check - did this warning raise because hctx->next_cpu, or > because cpumask_next_and() was called twice after jumping on > select_cpu label? > It seems to always happen when hctx->next_cpu == nr_cpu_ids-1 at the start of the function - no jumping involved. >> Signed-off-by: Valentin Schneider <vschneid@xxxxxxxxxx> >> --- >> block/blk-mq.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index c96c8c4f751b..30ae51eda95e 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) >> >> if (--hctx->next_cpu_batch <= 0) { >> select_cpu: > > Because we have backward looking goto, I have a strong feeling that the > code should be reorganized. > >> - next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, >> - cpu_online_mask); >> + if (next_cpu == nr_cpu_ids - 1) >> + next_cpu = nr_cpu_ids; >> + else >> + next_cpu = cpumask_next_and(next_cpu, >> + hctx->cpumask, >> + cpu_online_mask); >> + >> if (next_cpu >= nr_cpu_ids) >> next_cpu = blk_mq_first_mapped_cpu(hctx); > > This simply means 'let's start from the beginning', and should be > replaced with cpumask_next_and_wrap(). I hadn't looked in depth there, but that's a strange behaviour. If we get to the end of the cpumask, blk_mq_first_mapped_cpu() does: int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask); if (cpu >= nr_cpu_ids) cpu = cpumask_first(hctx->cpumask); return cpu; That if branch means the returned CPU is offline, which then triggers: if (!cpu_online(next_cpu)) { if (!tried) { tried = true; goto select_cpu; } but going back to select_cpu doesn't make much sense, we've already checked that hctx->cpumask and cpu_online_mask were disjoint. > >> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > > > Maybe something like this would work? > > if (--hctx->next_cpu_batch > 0 && cpu_online(next_cpu)) { > hctx->next_cpu = next_cpu; > return next_cpu; > } > > next_cpu = cpumask_next_and_wrap(next_cpu, hctx->cpumask, cpu_online_mask) > if (next_cpu < nr_cpu_ids) { > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > hctx->next_cpu = next_cpu; > return next_cpu; > } > > /* > * Make sure to re-select CPU next time once after CPUs > * in hctx->cpumask become online again. > */ > hctx->next_cpu = next_cpu; > hctx->next_cpu_batch = 1; > return WORK_CPU_UNBOUND; > > I didn't test it and likely screwed some corner case. I'm just > trying to say that picking next cpu should be an easier thing. > Agreed, your suggestion looks sane, let me play with that a bit. > Thanks, > Yury