> On Jul 29, 2019, at 8:03 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > >> On Mon, Jul 29, 2019 at 10:51:51AM -0400, Waiman Long wrote: >>> On 7/29/19 4:52 AM, Peter Zijlstra wrote: >>>> On Sat, Jul 27, 2019 at 01:10:47PM -0400, Waiman Long wrote: >>>> It was found that a dying mm_struct where the owning task has exited >>>> can stay on as active_mm of kernel threads as long as no other user >>>> tasks run on those CPUs that use it as active_mm. This prolongs the >>>> life time of dying mm holding up memory and other resources like swap >>>> space that cannot be freed. >>> Sure, but this has been so 'forever', why is it a problem now? >> >> I ran into this probem when running a test program that keeps on >> allocating and touch memory and it eventually fails as the swap space is >> full. After the failure, I could not rerun the test program again >> because the swap space remained full. I finally track it down to the >> fact that the mm stayed on as active_mm of kernel threads. I have to >> make sure that all the idle cpus get a user task to run to bump the >> dying mm off the active_mm of those cpus, but this is just a workaround, >> not a solution to this problem. > > The 'sad' part is that x86 already switches to init_mm on idle and we > only keep the active_mm around for 'stupid'. > > Rik and Andy were working on getting that 'fixed' a while ago, not sure > where that went. I thought the current status was that we don’t always switch to init_mm on idle and instead we use a fancier and actually correct flushing routine that only flushed idle CPUs when pagetables are freed. I still think we should be able to kill active_mm in favor of explicit refcounting in the arch code.