Re: [usb-storage] Re: cma: deadlock using usb-storage and fs

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Mon, 14 Jan 2019 15:47:08 -0800

On 1/11/19 5:55 AM, Gaël PORTAY wrote:
> On Mon, Jan 07, 2019 at 06:06:21PM -0800, Mike Kravetz wrote:
>> On 1/7/19 10:13 AM, Gaël PORTAY wrote:
>>> (...)
>>> I have also removed the mutex (start_isolate_page_range retunrs -EBUSY),
>>> and it worked (in my case).
>>>
>>> But I did not do the proper magic because I am not sure of what should
>>> be done and how: -EBUSY is not handled and __GFP_NOIO is not honored. 
>>
>> If we remove the mutex, I am pretty sure we would want to distinguish
>> between the (at least) two types of _EBUSY that can be returned by
>> alloc_contig_range().  Seems that the retry logic should be different if
>> a page block is busy as opposed to pages within the range.

Hello Gael,

I spent some time looking into removing cma_mutex.  My initial statement
that it is no longer absolutely necessary is correct.  This is because the
underlying code will not allow two threads to work on same pageblock.

However, removing the mutex causes issues.  If we remove the mutex, then
alloc_contig_range() will return -EBUSY if another thread is operating
on any of the pageblocks in the range.  We could make alloc_contig_range
distinguish between this and the existing -EBUSY case caused by the inability
to migrate pages in the range.  Even if we do this, we need to determine
what should happen if alloc_contig_range fails for this reason.  I can only
think of a few options:
1) Immediately fail the cma_alloc request.  The problem with doing this is
   that it would cause a regression.  Currently, two threads can call
   cma_alloc with ranges that touch the same pageblock.  With the mutex in
   place, the threads will wait for each other to finish with the pageblock.
   Without it, we return an error to one caller where it previously would
   have succeeded.  So, this approach is unacceptable.
2) Notice this new condition and 'retry'.  In theory this sounds possible.
   However, we do not know how long another thread will keep the pageblock(s)
   in question isolated.  It could be quite a long time depending on the
   what the other thread is trying to do.  We really do not want to go into
   a tight loop retrying.  Adding 'delays' between retrys seems like the
   wrong thing to do for a memory allocator.  So, I think this option is out.
3) In some cases, it might be possible to 'intelligently retry' with a
   range that does not touch busy pageblocks.  However, this is highly
   dependent on the cma area and size of allocation.  It may not always be
   possible and is not a suitable option.

Bottom line is that I can not think of a good way to remove cma_mutex without
possibly introducing some other issue.

It appears that you have already found a workaround for the issue you were
seeing.  Is this correct?  If so, I suggest we not go down the path of trying
to eliminate cma_mutex right now.  Or, perhaps someone else has other
suggestions.
-- 
Mike Kravetz