Nathan Lynch <nathan.lynch@xxxxxxx> writes: > Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx> writes: >> Nathan Lynch <nathan.lynch@xxxxxxx> writes: >> >>> Hi Vinicius, >>> >>> Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx> writes: >>>> Nathan Lynch <nathan.lynch@xxxxxxx> writes: >>>>> Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx> writes: >>>>>> Change the "wait for operation finish" logic to take interrupts into >>>>>> account. >>>>>> >>>>>> When using dmatest with idxd DMA engine, it's possible that during >>>>>> longer tests, the interrupt notifying the finish of an operation >>>>>> happens during wait_event_freezable_timeout(), which causes dmatest to >>>>>> cleanup all the resources, some of which might still be in use. >>>>>> >>>>>> This fix ensures that the wait logic correctly handles interrupts, >>>>>> preventing premature cleanup of resources. >>>>>> >>>>>> Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> >>>>>> Closes: https://lore.kernel.org/oe-lkp/202502171134.8c403348-lkp@xxxxxxxxx >>>>> >>>>> Given the report at the URL above I'm struggling to follow the rationale >>>>> for this change. It looks like a use-after-free in idxd while >>>>> resetting/unbinding the device, and I can't see how changing whether >>>>> dmatest threads perform freezeable waits would change this. >>>>> >>>> >>>> I think that the short version is that the reproducition script triggers >>>> different problems on different platforms/configurations. >>>> >>>> The solution I proposed fixes a problem I was seeing on a SPR system, on >>>> a GNR system (that I was only able to get later) I see something more similar >>>> to this particular splat (currently working on the fix). >>>> >>>> Seeing this question, I realize that I should have added a note to the >>>> commit detailing this. >>>> >>>> So I am planning on proposing two (this and another) fixes for the same >>>> report, hoping that it's not that confusing/unusual. >>> >>> I'm still confused... why is wait_event_freezable_timeout() the wrong >>> API for dmatest to use, and how could changing it to >>> wait_event_timeout() cause it to "take interrupts into account" that it >>> didn't before? >>> >> >> My understanding (and testing) is that wait_event_timeout() will block >> for the duration even in the face of interrupts, 'freezable' will not. > > They have different behaviors with respect to *signals* and the > wake_up() variant used, but not device interrupts. > Ah! That's something that I wasn't considering. That it could be something other than interrupts that were unblocking wait_event_*(). > dmatest_callback() employs wake_up_all(), which means this change > introduces no beneficial difference in the wakeup behavior. The dmatest > thread gets woken on receipt of the completion interrupt either way. > > And to reiterate, the change regresses the combination of dmatest and > the task freezer, which is a use case people have cared about, > apparently. > If this change in behavior causes a regression for others, glad to send a revert and find another solution. >>> AFAIK the only change made here is that dmatest threads effectively >>> become unfreezeable, which is contrary to prior authors' intentions: >>> >>> commit 981ed70d8e4f ("dmatest: make dmatest threads freezable") >>> commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") Cheers, -- Vinicius