Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/24/19 2:53 PM, Mike Kravetz wrote:
> On 6/24/19 2:30 PM, Qian Cai wrote:
>> So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for
>> too long seems unnecessarily and then goes to sleep sometimes due to direct
>> reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
>> -ENOMEM),
> 
> Thanks for looking into this!  I noticed that recent kernels could take a
> VERY long time trying to do high order allocations.  In my case it was trying
> to do dynamic hugetlb page allocations as well [1].  But, IMO this is more
> of a general direct reclaim/compation issue than something hugetlb specific.
> 

<snip>

>> Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the
>> semaphore to protect concurrency access, so it could just be converted to a
>> spinlock instead.
> 
> I do not have enough experience with this ipc code to comment on your proposed
> change.  But, I will look into it.
> 
> [1] https://lkml.org/lkml/2019/4/23/2

I only took a quick look at the ipc code, but there does not appear to be
a quick/easy change to make.  The issue is that shared memory creation could
take a long time.  With issue [1] above unresolved, creation of hugetlb backed
shared memory segments could take a VERY long time.

I do not believe the test failure is arm specific.  Most likely, it is just
because testing was done on a system with memory size to trigger this issue?

My plan is to focus on [1].  When that is resolved, this issue should go away.
-- 
Mike Kravetz




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux