On Tue, Apr 23, 2013 at 10:15 AM, Michel Dänzer <michel@xxxxxxxxxxx> wrote: > On Die, 2013-04-23 at 10:08 -0700, Andy Lutomirski wrote: >> On Mon, Apr 22, 2013 at 10:55 PM, Michel Dänzer <michel@xxxxxxxxxxx> wrote: >> > On Mon, 2013-04-22 at 16:19 -0700, Andy Lutomirski wrote: >> > >> >> I'm not convinced there's an actual hang. 40 seconds is a long time, >> >> and I've only ever seen this when clicking something, and when this >> >> happens, the screen goes blank immediately (not after a 40 second >> >> delay). >> > >> > Hmm, now that you mention this, I notice in your original report it >> > claims that the CP stalled for 'more than 5102593msec', which is clearly >> > bogus. Looks like something's wrong with the lockup detection. >> > Did this start after a kernel update or something like that? >> >> It's recent. It may have been when F18 switched from 3.7 to 3.8. > > Can you reproduce it with an upstream kernel? Can you bisect? I realize > it'll probably take a long time, but unless someone has an idea which > change might have introduced the problem... Yuck. I can try, but it takes days to reproduce this, so it will take forever (and may end up with a wrong answer if I get lucky and don't crash). > > >> I think there are bugs in the lockup detection and in the lockup >> recovery. Firefox, in particular, is *really* slow afterwards. Are >> interrupts possibly getting dropped or misconfigured during the reset? > > Let's not get ahead of ourselves and focus on the lockup detection issue > for now. I don't understand the r600_gpu_check_soft_reset code, but could this be the sequence of events that triggers it? 1. radeon_ring_is_lockup is called just as the very last command on the ring completes, so last_rptr gets set to the rptr. 2. Nothing happens for a while (i.e. > lockup_timeout). rptr doesn't change. 3. A very slightly slow operation starts. 4. radeon_ring_is_lockup gets called before that command completes. radeon_ring_test_lockup will not detect a jiffies wrap-around (because there wasn't one), rptr will equal last_rptr (because there hasn't been any progress since last time), and the elapsed time will be really long, because the function hasn't been called for a long time. So a lockup gets detected, even though nothing's wrong. There's a comment above radeon_ring_test_lockup that says: * A possible false positivie is if we get call after while and last_cp_rptr == * the current CP rptr, even if it's unlikely it might happen. To avoid this * if the elapsed time since last call is bigger than 2 second than we return * false and update the tracking information. Due to this the caller must call * radeon_ring_test_lockup several time in less than 2sec for lockup to be reported * the fencing code should be cautious about that. but the corresponding code doesn't appear to exist anywhere. Also, and unrelatedly, I revoke my comment about gmail issues being fixed with hyperz off. Gmail still draws incorrectly. This may or may not have anything to do with the radeon driver. --Andy _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel