On 6/24/19 2:53 PM, Mike Kravetz wrote: > On 6/24/19 2:30 PM, Qian Cai wrote: >> So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for >> too long seems unnecessarily and then goes to sleep sometimes due to direct >> reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns >> -ENOMEM), > > Thanks for looking into this! I noticed that recent kernels could take a > VERY long time trying to do high order allocations. In my case it was trying > to do dynamic hugetlb page allocations as well [1]. But, IMO this is more > of a general direct reclaim/compation issue than something hugetlb specific. > <snip> >> Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the >> semaphore to protect concurrency access, so it could just be converted to a >> spinlock instead. > > I do not have enough experience with this ipc code to comment on your proposed > change. But, I will look into it. > > [1] https://lkml.org/lkml/2019/4/23/2 I only took a quick look at the ipc code, but there does not appear to be a quick/easy change to make. The issue is that shared memory creation could take a long time. With issue [1] above unresolved, creation of hugetlb backed shared memory segments could take a VERY long time. I do not believe the test failure is arm specific. Most likely, it is just because testing was done on a system with memory size to trigger this issue? My plan is to focus on [1]. When that is resolved, this issue should go away. -- Mike Kravetz