On Mon, Sep 10, 2012 at 8:02 AM, Christian König <deathsimple@xxxxxxxxxxx> wrote: > On 10.09.2012 13:12, Michel Dänzer wrote: >> >> On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote: >>> >>> Only increase the higher 32bits if we really detect a wrap around. >>> >>> Fixes: >>> https://bugs.freedesktop.org/show_bug.cgi?id=54129 >>> https://bugs.freedesktop.org/show_bug.cgi?id=54662 >>> >>> Possible fixes: >>> https://bugzilla.redhat.com/show_bug.cgi?id=846505 >>> https://bugzilla.redhat.com/show_bug.cgi?id=845639 >>> >>> Signed-off-by: Christian König <deathsimple@xxxxxxxxxxx> >>> Cc: stable@xxxxxxxxxxxxxxx >>> --- >>> drivers/gpu/drm/radeon/radeon_fence.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c >>> b/drivers/gpu/drm/radeon/radeon_fence.c >>> index 7b737b9..4781e13 100644 >>> --- a/drivers/gpu/drm/radeon/radeon_fence.c >>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c >>> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, >>> int ring) >>> do { >>> seq = radeon_fence_read(rdev, ring); >>> seq |= last_seq & 0xffffffff00000000LL; >>> - if (seq < last_seq) { >>> + if (seq < (last_seq - 0x80000000LL)) { >>> seq += 0x100000000LL; >>> } >> >> Can you provide a bit more explanation for this change? In particular, >> how could the code previously detect a wraparound when there was none, >> and why is this the proper fix? > > > Honestly I also don't really understand how this bug happened in the first > place. > > We extend the 32bit fences supported by hardware by testing if a previously > read fence value is smaller than the value we read now: > >> if (seq < last_seq) { > > > But the problem seems to be that on some systems we do get fence values that > are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or maybe 5, 6, > 0, 7, 8 because somebody accidentally overwrites the fence value). > > It might be related to a hardware bug, or the algorithm is flawed in a way I > currently don't see. Anyway the old code we had wasn't so picky about such > problems and the patch just tries to make the current code as robust as the > old code was, which indeed seems to solve the problems we see. > > The wrap around detection still works (tested by setting the initial fence > value to 0xfffffff0 and letting it wrap around shortly after start), so I > think it we can safely commit this. > > Christian. > If fence read ever gave the value 0 then your patch is postponing the issue until last fence reach >= 0x1 8000 0001 (which will takes month/years of uptime to happen :)). Having a log of all emitted fence value and all read en fence value would probably be helpful so we have a better clue of what's going on. Sadly i don't think we don't receive fence in order but rather that we sometimes receive a 0 value fence, if we did receive fence out of order than the issue would also have happen with previous code. Otherwise patch is ack but please add a better comment about fence value most likely being 0 for unknow reasons or receive out of order (thought i don't think so). Cheers, Jerome _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel