https://bugzilla.kernel.org/show_bug.cgi?id=202349 Lucas Stach (dev@xxxxxxxxxx) changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dev@xxxxxxxxxx --- Comment #8 from Lucas Stach (dev@xxxxxxxxxx) --- So with my GPU developer hat on: What is it that the GPU driver devs should do to avoid this? GPU tasks aren't per-se realtime or interactive (think GPU compute tasks with big working sets, that can take hours or even days to complete). So from the driver perspective we _want_ FS reclaim to happen instead of failing a task, but in general for the interactive workload we would rather trash the working set a bit (by purging clean caches) instead of doing a high latency writeback of dirty caches. Also to put the obvious question up again: Why doesn't the reporter see any of those catastrophic latency events with ext4? It it just that XFS can queue up way more IO and thus direct reclaim getting stuck behind the already scheduled IO for a way longer time? If I understand the problem right, the basic issue is that we still don't have have IO less dirty throttling in Linux, so we make the interactive GPU process pay part of the bill of the background process dirtying lots of pages, instead of throttling this one earlier. -- You are receiving this mail because: You are watching the assignee of the bug.