On Thu, 2024-08-22 at 10:21 +0200, Thomas Hellström wrote: > On Thu, 2024-08-22 at 09:55 +0200, Christian König wrote: > > Am 22.08.24 um 08:47 schrieb Thomas Hellström: > > > > > > As Sima said, this is complicated but not beyond > > > > > > comprehension: > > > > > > i915 > > > > > > https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c#L317 > > > > > As far as I can tell what i915 does here is extremely > > > > > questionable. > > > > > > > > > > if (sc->nr_scanned < sc->nr_to_scan && > > > > > current_is_kswapd()) { > > > > > .... > > > > > with_intel_runtime_pm(&i915->runtime_pm, wakeref) { > > > > > > > > > > with_intel_runtime_pm() then calls pm_runtime_get_sync(). > > > > > > > > > > So basically the i915 shrinker assumes that when called from > > > > > kswapd() > > > > > that it can synchronously wait for runtime PM to power up the > > > > > device > > > > > again. > > > > > > > > > > As far as I can tell that means that a device driver makes > > > > > strong > > > > > and > > > > > completely undocumented assumptions how kswapd works > > > > > internally. > > > > Admittedly that looks weird > > > > > > > > But I'd really expect a reclaim lockdep splat to happen there > > > > if > > > > the > > > > i915 pm did something not-allowed. IIRC, the design direction > > > > the > > > > i915 > > > > people got from mm people regarding the shrinkers was to avoid > > > > any > > > > sleeps in direct reclaim and punt it to kswapd. Need to ask > > > > i915 > > > > people > > > > how they can get away with that. > > > > > > > > > > > So it turns out that Xe integrated pm resume is reclaim-safe, and > > > I'd > > > expect i915's to be as well. Xe discrete pm resume isn't. > > > > > > So that means that, at least for integrated, the i915 shrinker > > > should > > > be ok from that POW, and punting certain bos to kswapd is not > > > AFAICT > > > abusing any undocumented features of kswapd but rather a way to > > > avoid > > > resuming the device during direct reclaim, like documented. > > > > The more I think about this the more I disagree to this driver > > design. > > In my opinion device drivers should *never* resume runtime PM in a > > shrinker callback in the first place. > > Runtime PM resume is allowed even from irq context if carefully > implemented by the driver and flagged as such to the core. > > https://docs.kernel.org/power/runtime_pm.html > > Resuming runtime PM from reclaim therefore shouldn't be an issue IMO, > and really up to the driver. > > > > > When the device is turned off it means that all of it's operations > > are > > stopped and eventually power to caches etc turned off as well. So I > > don't see any ongoing writeback operations or similar either. > > > > So the question is why do we need to power on the device in a > > shrinker > > in the first place? > > > > What could be is that the device needs to flush GART TLBs or > > similar > > when it is turned on, e.g. that you grab a PM reference to make > > sure > > that during your HW operation the device doesn't suspend. > > Exactly why the i915 needs to flush the GART I'm not sure of but I > suspect the gart TLB might be forgotten while suspended. To be clear, the intel GGTTs are programmed using device mmio. And while i915 is a data point, there's no immediate plans to convert it to using the TTM page pool AFAIK, so for the sake of discussion let's focus on the Xe usage. /Thomas