Laura, On Tue, Dec 18, 2018 at 01:14:42PM -0800, Laura Abbott wrote: > On 12/18/18 11:42 AM, Mike Kravetz wrote: > > On 12/17/18 1:57 PM, Laura Abbott wrote: > > > On 12/17/18 10:29 AM, Gaël PORTAY wrote: > > > > > > Last time I looked at this, we needed the cma_mutex for serialization > > > so unless we want to rework that, I think we need to not use CMA in the > > > writeback case (i.e. GFP_IO). I followed what you suggested and add gfpflags_allow_writeback that tests against the __GFP_IO flag: static inline bool gfpflags_allow_writeback(const gfp_t gfp_flags) { return !!(gfp_flags & __GFP_IO); } And then not to go for CMA in the case of writeback in function __dma_alloc: - cma = allowblock ? dev_get_cma_area(dev) : false; + allowwriteback = gfpflags_allow_writeback(gfp); + cma = (allowblock && !allowwriteback) ? dev_get_cma_area(dev) : false; This workaround fixes the issue I faced (I have prepared a patch). > > I am wondering if we still need to hold the cma_mutex while calling > > alloc_contig_range(). Looking back at the history, it appears that > > the reason for holding the mutex was to prevent two threads from operating > > on the same pageblock. > > > > Commit 2c7452a075d4 ("mm/page_isolation.c: make start_isolate_page_range() > > fail if already isolated") will cause alloc_contig_range to return EBUSY > > if two callers are attempting to operate on the same pageblock. This was > > added because memory hotplug as well as gigantac page allocation call > > alloc_contig_range and could conflict with each other or cma. cma_alloc > > has logic to retry if EBUSY is returned. Although, IIUC it assumes the > > EBUSY is the result of specific pages being busy as opposed to someone > > else operating on the pageblock. Therefore, the retry logic to 'try a > > different set of pages' is not what one would/should attempt in the case > > someone else is operating on the pageblock. > > > > Would it be possible or make sense to remove the mutex and retry when > > EBUSY? Or, am I missing some other reason for holding the mutex. > > > > I had forgotten that start_isolate_page_range had been updated to > return -EBUSY. It looks like we would need to update > the callback for migrate_pages in __alloc_contig_migrate_range > since alloc_migrate_target by default will use __GFP_IO. > So I _think_ if we update that to honor GFP_NOIO we could > remove the mutex assuming the rest of migrate_pages honors > it properly. > I have also removed the mutex (start_isolate_page_range retunrs -EBUSY), and it worked (in my case). But I did not do the proper magic because I am not sure of what should be done and how: -EBUSY is not handled and __GFP_NOIO is not honored. Regards, Gael