Re: [usb-storage] Re: cma: deadlock using usb-storage and fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/18/18 11:42 AM, Mike Kravetz wrote:
On 12/17/18 1:57 PM, Laura Abbott wrote:
On 12/17/18 10:29 AM, Gaël PORTAY wrote:
Alan,

On Mon, Dec 17, 2018 at 10:45:17AM -0500, Alan Stern wrote:
On Sun, 16 Dec 2018, Gaël PORTAY wrote:
...

The second task wants to writeback/flush the pages through USB, which, I
assume, is due to the page migration. The usb-storage triggers a CMA allocation
but get locked in cma_alloc since the first task hold the mutex (It is a FAT
formatted partition, if it helps).

     usb-storage     D    0   349      2 0x00000000
     Backtrace:
...
     [<bf1c7550>] (usb_sg_wait [usbcore]) from [<bf2bd618>]
(usb_stor_bulk_transfer_sglist.part.2+0x80/0xdc [usb_storage]) r9:0001e000
r8:eca594ac r7:0001e000 r6:c0008200 r5:eca59514 r4:eca59488

It looks like there is a logical problem in the CMA allocator.  The
call in usb_sg_wait() specifies GFP_NOIO, which is supposed to prevent
allocations from blocking on any I/O operations.  Therefore we
shouldn't be waiting for the CMA mutex.


Right.

Perhaps the CMA allocator needs to drop the mutex while doing
writebacks/flushes, or perhaps it needs to be reorganized some other
way.  I don't know anything about it.

Does the CMA code have any maintainers who might need to know about
this, or is it all handled by the MM maintainers?

I did not find maintainers neither for CMA nor MM.

That is why I have sent this mail to mm mailing list but to no one in
particular.


Last time I looked at this, we needed the cma_mutex for serialization
so unless we want to rework that, I think we need to not use CMA in the
writeback case (i.e. GFP_IO).

I am wondering if we still need to hold the cma_mutex while calling
alloc_contig_range().  Looking back at the history, it appears that
the reason for holding the mutex was to prevent two threads from operating
on the same pageblock.

Commit 2c7452a075d4 ("mm/page_isolation.c: make start_isolate_page_range()
fail if already isolated") will cause alloc_contig_range to return EBUSY
if two callers are attempting to operate on the same pageblock.  This was
added because memory hotplug as well as gigantac page allocation call
alloc_contig_range and could conflict with each other or cma.   cma_alloc
has logic to retry if EBUSY is returned.  Although, IIUC it assumes the
EBUSY is the result of specific pages being busy as opposed to someone
else operating on the pageblock.  Therefore, the retry logic to 'try a
different set of pages' is not what one  would/should attempt in the case
someone else is operating on the pageblock.

Would it be possible or make sense to remove the mutex and retry when
EBUSY?  Or, am I missing some other reason for holding the mutex.


I had forgotten that start_isolate_page_range had been updated to
return -EBUSY. It looks like we would need to update
the callback for migrate_pages in __alloc_contig_migrate_range
since alloc_migrate_target by default will use __GFP_IO.
So I _think_ if we update that to honor GFP_NOIO we could
remove the mutex assuming the rest of migrate_pages honors
it properly.

Thanks,
Laura




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux