Quoting Tvrtko Ursulin (2018-01-24 13:09:37) > > On 22/01/2018 15:41, Chris Wilson wrote: > > If we remember to cancel the signaler on a request when retiring it > > (after we know that the request has been signaled), we do not need to > > carry an additional request in the signaler itself. This prevents an > > issue whereby the signaler threads may be delayed and hold on to > > thousands of request references, causing severe memory fragmentation and > > premature oom (most noticeable on 32b snb due to the limited GFP_KERNEL > > and frequent use of inter-engine fences). > > What is starving the signaler thread, which is set to SCHED_FIFO, and > can't be tasklets on SNB? Interrupts. MI_USER_INTERRUPT to be precise, but we have to check all the other sources on snb as well. > Before I actually start revieweing the code, which I'd rather avoid :) : > > Is it just not able to process enough requests in it's time-slice > (need_resched) so is falling behind? It would be surprising since I > would expect it to be much lighter wait processing there, per request, > than on the submission paths. The conclusion is a bit odd, but more or less it's just a pathological case where interrupts + rt task are contending for one cpu with submission proceeding on another. Making the signaler lighter was the intention of the rest of the series, but this patch by itself prevents the runaway references. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx