Hi Dave, On Wed, Jun 16, 2010 at 08:57:09AM +1000, Dave Airlie wrote: > > (switching back to email, actually) > > > > On Sun, 13 Jun 2010 13:01:57 GMT > > bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=16148 > > > > > > > > > Mikko C. <mikko.cal@xxxxxxxxx> changed: > > > > > > What |Removed |Added > > > ---------------------------------------------------------------------------- > > > CC| |mikko.cal@xxxxxxxxx > > > > > > > > > > > > > > > --- Comment #8 from Mikko C. <mikko.cal@xxxxxxxxx> 2010-06-13 13:01:53 --- > > > I have been getting this with 2.6.35-rc2 and rc3. > > > Could it be the same problem? > > > > > > > > > X: page allocation failure. order:0, mode:0x4 > > > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1 > > > Call Trace: > > > [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680 > > > [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680 > > > [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm] > > > [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm] > > > [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm] > > > [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm] > > > [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm] > > > [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm] > > > [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm] > > > [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20 > > > [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm] > > > [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm] > > > [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon] > > > [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon] > > > [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon] > > > [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon] > > > [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm] > > > [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon] > > > [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100 > > > [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0 > > > [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530 > > > [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0 > > > [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80 > > > [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90 > > > [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b > > > > That's different. ttm_get_pages() looks pretty busted to me. It's not > > using __GFP_WAIT and it's not using __GFP_FS. It's using a plain > > GFP_DMA32 so it's using atomic allocations even though it doesn't need > > to. IOW, it's shooting itself in the head. > > This is an accurate assessment. > > Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS > > and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT. > > If so, it should very much do so. If not then the function is > > misdesigned and should be altered to take a gfp_t argument so the > > caller can tell ttm_get_pages() which is the strongest allocation mode > > which it may use. > > > > > [TTM] Unable to allocate page. > > > radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002) > > > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (7827456, > > > 2, 4096, -12) > > > > This bug actually broke stuff for you. > > > > Something like this: > > > > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a > > +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c > > @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page > > /* No pool for cached pages */ > > if (pool == NULL) { > > if (flags & TTM_PAGE_FLAG_DMA32) > > - gfp_flags |= GFP_DMA32; > > + gfp_flags |= GFP_KERNEL|GFP_DMA32; > > else > > gfp_flags |= GFP_HIGHUSER; > > > > _ > > > > although I wonder whether it should be using pool->gfp_flags. > > > > > > It's a shame that this code was developed and merged in secret :( Had > > we known, we could have looked at enhancing mempools to cover the > > requirement, or at implementing this in some generic fashion rather > > than hiding it down in drivers/gpu/drm. > > Its been post to lkml at least once or twice over the past few years > though not as much as it was posted to dri-devel, but that was because > we had never seen anyone show any interest in it outside of kernel > hackers. Originally I was going to use the generic allocator stuff ia64 > uses for uncached allocations but it allocates memory ranges not pages > so it wasn't useful. I also suggested getting a page flag for uncached > allocator stuff, I was told to go write the code in my own corner and > prove it was required. So I did, and it was cleaned up and worked on by > others and I merged it. So can we lay off with the "in secret", the > original code is nearly 2 years old at this point and just because -mm > hackers choose to ignore it isn't our fault. Patches welcome. > No comment on this aspect of things. > So now back to the bug: > > So the page pools are setup with gfp flags, in the normal case, 4 pools, > one WC GFP_HIGHUSER pages, one UC HIGHUSER pages, one WC GFP_USER| > GFP_DMA32, one UC GFP_USER|GFP_DMA32, so the pools are all fine, the > problem here is the same as before we added the pools, which is the > normal page allocation path, which needs the GFP_USER added instead of > GFP_KERNEL. > > That said I've noticed a lot more page allocation failure reports in > 2.6.35-rcX than we've gotten for a long time, in code that hasn't > changed (the AGP ones the other day for example) has something in the > core MM regressed (again... didn't this happen back in 2.6.31 or > something). > > (cc'ing Mel who tracked these down before). > Ok, so there has been very little changed in the page allocator that might have caused this. The only possibility I can think of is [6dda9d: page allocator: reduce fragmentation in buddy allocator by adding buddies that are merging to the tail of the free lists] but I would have expected the patch to help, not hinder, this situation. In the last few kernel releases when there has been a spike in high-order allocation, it has been largely due to an increased source of high-order callers from somewhere. I am 99% sure the last time I encountered this, I used the following script to debug it even though it's a bit of a hack. It uses ftrace to track high-order allocators and reports the number of high-order allocations (either atomic or normal) and the count of unique backtraces (to point the finger at bad callers) Unfortunately, I don't have access to a suitable machine to test this with right now (my laptop isn't making high-order allocations to confirm yet) but I'm pretty sure it's the right one! If you do some fixed set of tests with 2.6.34 and 2.6.35-rc2 with this script running and compare the reports, it might point the finger at a new source of frequent high-order allocations that might be leading to more failures. Do something like ./watch-highorder.pl -o highorder-2.6.34-report.txt PID=$! Do your tests # Two sigints in quick succession to make the monitor exit kill $PID kill $PID ==== CUT HERE ==== #!/usr/bin/perl # This is a tool that analyses the trace output related to page allocation, # sums up the number of high-order allocations taking place and who the # callers are # # Example usage: trace-pagealloc-highorder.pl -o output-report.txt # options # -o, --output Where to store the report # # Copyright (c) IBM Corporation 2009 # Author: Mel Gorman <mel@xxxxxxxxx> use strict; use Getopt::Long; # Tracepoint events use constant MM_PAGE_ALLOC => 1; use constant EVENT_UNKNOWN => 7; use constant HIGH_NORMAL_HIGHORDER_ALLOC => 10; use constant HIGH_ATOMIC_HIGHORDER_ALLOC => 11; my $opt_output; my %stats; my %unique_events; my $last_updated = 0; $stats{HIGH_NORMAL_HIGHORDER_ALLOC} = 0; $stats{HIGH_ATOMIC_HIGHORDER_ALLOC} = 0; # Catch sigint and exit on request my $sigint_report = 0; my $sigint_exit = 0; my $sigint_pending = 0; my $sigint_received = 0; sub sigint_handler { my $current_time = time; if ($current_time - 2 > $sigint_received) { print "SIGINT received, report pending. Hit ctrl-c again to exit\n"; $sigint_report = 1; } else { if (!$sigint_exit) { print "Second SIGINT received quickly, exiting\n"; } $sigint_exit++; } if ($sigint_exit > 3) { print "Many SIGINTs received, exiting now without report\n"; exit; } $sigint_received = $current_time; $sigint_pending = 1; } $SIG{INT} = "sigint_handler"; # Parse command line options GetOptions( 'output=s' => \$opt_output, ); # Defaults for dynamically discovered regex's my $regex_pagealloc_default = 'page=([0-9a-f]*) pfn=([0-9]*) order=([-0-9]*) migratetype=([-0-9]*) gfp_flags=([A-Z_|]*)'; # Dyanically discovered regex my $regex_pagealloc; # Static regex used. Specified like this for readability and for use with /o # (process_pid) (cpus ) ( time ) (tpoint ) (details) my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)'; my $regex_statname = '[-0-9]*\s\((.*)\).*'; my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*'; sub generate_traceevent_regex { my $event = shift; my $default = shift; my $regex; # Read the event format or use the default if (!open (FORMAT, "/sys/kernel/debug/tracing/events/$event/format")) { $regex = $default; } else { my $line; while (!eof(FORMAT)) { $line = <FORMAT>; if ($line =~ /^print fmt:\s"(.*)",.*/) { $regex = $1; $regex =~ s/%p/\([0-9a-f]*\)/g; $regex =~ s/%d/\([-0-9]*\)/g; $regex =~ s/%lu/\([0-9]*\)/g; $regex =~ s/%s/\([A-Z_|]*\)/g; $regex =~ s/\(REC->gfp_flags\).*/REC->gfp_flags/; $regex =~ s/\",.*//; } } } # Verify fields are in the right order my $tuple; foreach $tuple (split /\s/, $regex) { my ($key, $value) = split(/=/, $tuple); my $expected = shift; if ($key ne $expected) { print("WARNING: Format not as expected '$key' != '$expected'"); $regex =~ s/$key=\((.*)\)/$key=$1/; } } if (defined shift) { die("Fewer fields than expected in format"); } return $regex; } $regex_pagealloc = generate_traceevent_regex("kmem/mm_page_alloc", $regex_pagealloc_default, "page", "pfn", "order", "migratetype", "gfp_flags"); sub process_events { my $traceevent; my $process_pid = 0; my $cpus; my $timestamp; my $tracepoint; my $details; my $statline; my $nextline = 1; # Read each line of the event log EVENT_PROCESS: while (($traceevent = <TRACING>) && !$sigint_exit) { SKIP_LINEREAD: if ($traceevent eq "") { last EVENT_PROCSS; } if ($traceevent =~ /$regex_traceevent/o) { $process_pid = $1; $tracepoint = $4; } else { next; } # Perl Switch() sucks majorly if ($tracepoint eq "mm_page_alloc") { my ($page, $order, $gfp_flags, $type); my ($atomic); my $details = $5; if ($details !~ /$regex_pagealloc/o) { print "WARNING: Failed to parse mm_page_alloc as expected\n"; print "$details\n"; print "$regex_pagealloc\n"; print "\n"; next; } $page = $1; $order = $3; $gfp_flags = $5; # Only concerned with high-order allocs if ($order == 0) { next; } $stats{MM_PAGE_ALLOC}++; if ($gfp_flags =~ /ATOMIC/) { $stats{HIGH_ATOMIC_HIGHORDER_ALLOC}++; $type = "atomic"; } else { $stats{HIGH_NORMAL_HIGHORDER_ALLOC}++; $type = "normal"; } # Record the stack trace $traceevent = <TRACING>; if ($traceevent !~ /stack trace/) { goto SKIP_LINEREAD; } my $event = "order=$order $type gfp_flags=$gfp_flags\n";; while ($traceevent = <TRACING>) { if ($traceevent !~ /^ =>/) { $unique_events{$event}++; my $current = time; if ($current - $last_updated > 60) { $last_updated = $current; update_report(); } goto SKIP_LINEREAD; } $event .= $traceevent; } } else { $stats{EVENT_UNKNOWN}++; } if ($sigint_pending) { last EVENT_PROCESS; } } } sub update_report() { my $event; open (REPORT, ">$opt_output") || die ("Failed to open $opt_output for writing"); foreach $event (keys %unique_events) { print REPORT $unique_events{$event} . " instances $event\n"; } print REPORT "High-order normal allocations: " . $stats{HIGH_NORMAL_HIGHORDER_ALLOC} . "\n"; print REPORT "High-order atomic allocations: " . $stats{HIGH_ATOMIC_HIGHORDER_ALLOC} . "\n"; close REPORT; } sub print_report() { print "\nReport\n"; open (REPORT, $opt_output) || die ("Failed to open $opt_output for reading"); while (<REPORT>) { print $_; } close REPORT; } # Process events or signals until neither is available sub signal_loop() { my $sigint_processed; do { $sigint_processed = 0; process_events(); # Handle pending signals if any if ($sigint_pending) { my $current_time = time; if ($sigint_exit) { print "Received exit signal\n"; $sigint_pending = 0; } if ($sigint_report) { if ($current_time >= $sigint_received + 2) { update_report(); print_report(); $sigint_report = 0; $sigint_pending = 0; $sigint_processed = 1; $sigint_exit = 0; } } } } while ($sigint_pending || $sigint_processed); } sub set_traceoption($) { my $option = shift; open(TRACEOPT, ">/sys/kernel/debug/tracing/trace_options") || die("Failed to open trace_options"); print TRACEOPT $option; close TRACEOPT; } sub enable_tracing($) { my $tracing = shift; open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || die("Failed to open tracing event $tracing"); print TRACING "1"; close TRACING; } sub disable_tracing($) { my $tracing = shift; open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || die("Failed to open tracing event $tracing"); print TRACING "0"; close TRACING; } set_traceoption("stacktrace"); set_traceoption("sym-offset"); set_traceoption("sym-addr"); enable_tracing("kmem/mm_page_alloc"); open(TRACING, "/sys/kernel/debug/tracing/trace_pipe") || die("Failed to open trace_pipe"); signal_loop(); close TRACING; disable_tracing("kmem/mm_page_alloc"); update_report(); print_report(); _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel