Hi Christoph, On Mon, Nov 25, 2024 at 1:37 AM Christoph Hellwig <hch@xxxxxx> wrote: > > On Fri, Nov 22, 2024 at 03:55:23PM -0600, Sam Protsenko wrote: > > It's an Exynos based board with eMMC, so it uses DW MMC driver, with > > Exynos glue layer on top of it, so: > > > > drivers/mmc/host/dw_mmc.c > > drivers/mmc/host/dw_mmc-exynos.c > > > > I'm using the regular ARM64 defconfig. Nothing fancy about this setup > > neither, the device tree with eMMC definition (mmc_0) is here: > > > > arch/arm64/boot/dts/exynos/exynos850-e850-96.dts > > Thanks. eMMC itself never looks at the ioprio field. > > > FWIW, I was able to narrow down the issue to dd_insert_request() > > function. With this hack the freeze is gone: > > Sounds like it isn't the driver that matters here, but the scheduler. > > > > > 8<-------------------------------------------------------------------->8 > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > > index acdc28756d9d..83d272b66e71 100644 > > --- a/block/mq-deadline.c > > +++ b/block/mq-deadline.c > > @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx > > *hctx, struct request *rq, > > struct request_queue *q = hctx->queue; > > struct deadline_data *dd = q->elevator->elevator_data; > > const enum dd_data_dir data_dir = rq_data_dir(rq); > > - u16 ioprio = req_get_ioprio(rq); > > + u16 ioprio = 0; /* the same as old req->ioprio */ > > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > > struct dd_per_prio *per_prio; > > enum dd_prio prio; > > 8<-------------------------------------------------------------------->8 > > > > Does it tell you anything about where the possible issue can be? > > Can you dump the ioprities you see here with and without the reverted > patch? > Collected the logs for you: - with patch reverted (ioprio is always 0): [1] - with patch present: [2] This code was added for printing the traces: 8<---------------------------------------------------------------------->8 static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_insert_t flags, struct list_head *free) { +#define IOPRIO_N_LIMIT 100 + static int ioprio_prev = 0, ioprio_n = 0; struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; const enum dd_data_dir data_dir = rq_data_dir(rq); u16 ioprio = req_get_ioprio(rq); u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); struct dd_per_prio *per_prio; enum dd_prio prio; + ioprio_n++; + if (ioprio != ioprio_prev || ioprio_n == IOPRIO_N_LIMIT) { + pr_err("### %-20d : %d times\n", ioprio_prev, ioprio_n); + ioprio_n = 0; + } + ioprio_prev = ioprio; + lockdep_assert_held(&dd->lock); 8<---------------------------------------------------------------------->8 Specifically I'd pay attention to the next two places in [2], where the delays were introduced: 1. Starting getty service (5 second delay): 8<---------------------------------------------------------------------->8 [ 14.875199] ### 24580 : 1 times ... [ OK ] Started getty@tty1.service - Getty on tty1. [ OK ] Started serial-getty@ttySAâice - Serial Getty on ttySAC0. [ OK ] Reached target getty.target - Login Prompts. [ 19.425354] ### 0 : 100 times ... 8<---------------------------------------------------------------------->8 2. Login (more than 60 seconds delay): 8<---------------------------------------------------------------------->8 runner-vwmj3eza-project-40964107-concurrent-0 login: root ... [ 22.827432] ### 0 : 100 times ... [ 100.100402] ### 24580 : 1 times # 8<---------------------------------------------------------------------->8 I guess the results look similar to the logs from the neighboring thread [3]. Not sure if those tools (getty service and login tool) are running ioprio_set() internally, or it's just I/O scheduler acting up, but the freeze is happening consistently in those two places. Please let me know if I can help you debug that further. Not a block layer expert by any means, but that looks like a regression, at least on E850-96 board. Wonder if it's possible to reproduce that on other platforms. Thanks! [1] https://termbin.com/aus7 [2] https://termbin.com/za3t [3] https://lore.kernel.org/all/CAP-bSRab1C-_aaATfrgWjt9w0fcYUCQCG7u+TCb1FSPSd6CEaA@xxxxxxxxxxxxxx/