On Tue, 2009-06-02 at 18:38 -0400, David Malcolm wrote: > On Tue, 2009-06-02 at 15:32 -0700, David L wrote: > > On Tue, Jun 2, 2009 at 2:44 PM, Kevin DeKorte wrote: > > > On 06/02/2009 03:37 PM, David L wrote: > > >> My f11 system seems extremely slow when running > > >> some 2D gtk/cairo apps. Is there a benchmark > > >> suite that is yum installable for testing X performance? > > <snip> > > >> > > > > > > Personally I find that gtkperf is useful although not an exact science. > > > I like to run it with the following options > > > > > > gtkperf -c 500 -a > > > > > Thanks Kevin, > > > > That helped confirm my suspicions. Here's my output with > > the intel driver on my Intel 82865G: KMS or UMS? > [snip various results] > > GtkDrawingArea - Circles - time: 44.60 > > GtkDrawingArea - Text - time: 15.80 > [snip] > > > Same computer using the vesa driver: > > GtkPerf 0.40 - Starting testing: Tue Jun 2 15:19:09 2009 > > [snip] > >GtkDrawingArea - Circles - time: 1.43 > > GtkDrawingArea - Text - time: 1.31 > [snip] > > Looks like the circles/text tests are the most obvious differences, > though most tests show marked differences. As with everything, it helps to know what you're measuring. Circles is the wide arc rendering path in the X server. It's essentially unused by gtk apps in general, but gtkperf does it anyway. The arcs specified by the X protocol are insanely ugly (which is why nobody uses them in real apps) and also not a hardware-accelerated primitive. We could break them down to spans inside the X server and accelerate filling those spans, but we don't. So they happen in software. Which is also true for the vesa driver, but there's a catch. The vesa driver uses a trick called 'shadowfb', where the whole screen is rendered in (cached) host memory and then the updated regions are uploaded to the actual scanout memory. This is adequately fast, because it minimizes the number of memory cycles (read cycles in particular) that you do to the framebuffer, which is typically uncached. In the intel driver, it's a different story. We don't keep a shadow, so the software fallback happens either cached or uncached, depending how we map the framebuffer. If it's cached, you have to do a big cache flush when you finish rendering so the bits actually make it out from the CPU's cache to the framebuffer. If it's uncached, you're hitting main memory on every cycle, which is also not great. I don't remember offhand whether the text test is using Render or the old core font path. If the latter, then the same scenario applies; it's not accelerated (because it's actually rather hard to accelerate well), and the software path can't help but suck. For the other stuff, 865 and 855 appear to have a chipset bug where the command buffer doesn't always flush to the GPU reliably, so we have to flush the entire CPU cache on every acceleration command: http://cvs.fedoraproject.org/viewvc/rpms/kernel/F-11/drm-intel-big-hammer.patch?revision=1.1&view=markup We do try to mitigate this by batching up big sequences of commands rather than lots of little ones, but it still hurts. So, two things. Compare gtkperf results with different resolutions on the intel driver; tests which are hitting the software fallback path will be faster at smaller resolutions, because there's less framebuffer to clflush out. Also, figure out whether you're using KMS or not, and try the other way. - ajax
Attachment:
signature.asc
Description: This is a digitally signed message part
-- fedora-test-list mailing list fedora-test-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-test-list