Re: page allocator bug in 3.16?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/25/2014 04:33 PM, Alex Deucher wrote:
> On Thu, Sep 25, 2014 at 2:55 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>> After several days uptime with a 3.16 kernel (generally running
>> Thunderbird, emacs, kernel builds, several Chrome tabs on multiple
>> desktop workspaces) I've been seeing some really extreme slowdowns.
>>
>> Mostly the slowdowns are associated with gpu-related tasks, like
>> opening new emacs windows, switching workspaces, laughing at internet
>> gifs, etc. Because this x86_64 desktop is nouveau-based, I didn't pursue
>> it right away -- 3.15 is the first time suspend has worked reliably.
>>
>> This week I started looking into what the slowdown was and discovered
>> it's happening during dma allocation through swiotlb (the cpus can do
>> intel iommu but I don't use it because it's not the default for most users).
>>
>> I'm still working on a bisection but each step takes 8+ hours to
>> validate and even then I'm no longer sure I still have the 'bad'
>> commit in the bisection. [edit: yup, I started over]
>>
>> I just discovered a smattering of these in my logs and only on 3.16-rc+ kernels:
>> Sep 25 07:57:59 thor kernel: [28786.001300] alloc_contig_range test_pages_isolated(2bf560, 2bf562) failed
>>
>> This dual-Xeon box has 10GB and sysrq Show Memory isn't showing heavy
>> fragmentation [1].
>>
>> Besides Mel's page allocator changes in 3.16, another suspect commit is:
>>
>> commit b13b1d2d8692b437203de7a404c6b809d2cc4d99
>> Author: Shaohua Li <shli@xxxxxxxxxx>
>> Date:   Tue Apr 8 15:58:09 2014 +0800
>>
>>     x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB
>>
>> Specifically, this statement:
>>
>>     It could cause incorrect page aging and the (mistaken) reclaim of
>>     hot pages, but the chance of that should be relatively low.
>>
>> I'm wondering if this could cause worse-case behavior with TTM? I'm
>> testing a revert of this on mainline 3.16-final now, with no results yet.
>>
>> Thoughts?
> 
> You may also be seeing this:
> https://lkml.org/lkml/2014/8/8/445

Thanks Alex. That is indeed the problem.

Still reading the email thread to find out where the patches
are that fix this. Although it doesn't make much sense to me
that nouveau sets up a 1GB GART and then uses TTM which is
trying to shove all the DMA through a 16MB CMA window
(which turns out to be the base Ubuntu config).

Regards,
Peter Hurley


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux