On Tue, Aug 18, 2015 at 05:25:55PM +0300, Joonas Lahtinen wrote: > Hi, > > On pe, 2015-08-14 at 10:58 +0200, Daniel Vetter wrote: > > On Thu, Aug 13, 2015 at 03:49:35PM -0700, Ben Widawsky wrote: > > > On Thu, Aug 13, 2015 at 10:33:00AM +0300, Joonas Lahtinen wrote: > > > > Hi, > > > > > > > > On ke, 2015-08-12 at 18:35 -0700, Ben Widawsky wrote: > > > > > On Wed, Aug 12, 2015 at 03:10:18PM +0300, Joonas Lahtinen > > > > > wrote: > > > > > > On ke, 2015-08-12 at 12:26 +0100, Arun Siluvery wrote: > > > > > > > From Gen9, by default push constant command is not > > > > > > > committed to > > > > > > > the > > > > > > > shader unit > > > > > > > untill the corresponding shader's BTP_* command is parsed. > > > > > > > This > > > > > > > is > > > > > > > the > > > > > > > behaviour when set shader is enabled. This patch updates > > > > > > > the > > > > > > > batch to > > > > > > > follow > > > > > > > this requirement otherwise it results in gpu hang. > > > > > > > > > > > > > > Bugzilla: > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=89959 > > > > > > > > > > > > > > Set shader need to be disabled if legacy behaviour is > > > > > > > required. > > > > > > > > > > > > > > Cc: Ben Widawsky <benjamin.widawsky@xxxxxxxxx> > > > > > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > > > > > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > > > > > > Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx > > > > > > > > > > > > > > > > > > > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx > > > > > > > > > > > > > > > > > > > > > > > Repeating what I said on the mesa thread: > > > > > Does anyone understand why this actually causes a hang on the > > > > > IGT > > > > > test? I > > > > > certainly don't. The docs are pretty clear that the constant > > > > > command > > > > > is not > > > > > committed until the BTP command, but I can't make any sense of > > > > > how it > > > > > related to > > > > > a GPU hang. > > > > > > > > > > > > > Changing the order makes the hang go away and come back for sure, > > > > we've > > > > all been experiencing that. System validation said it is a > > > > programming > > > > restriction, so I'm not sure how relevant it is what goes wrong > > > > if it's > > > > not followed. And the legacy mode bits were added to support the > > > > old > > > > behavior of having the order like it has been previously, so I do > > > > not > > > > see why question it without visibility to the actual RTL. And > > > > enabling > > > > the legacy bits makes the hang go away, too. > > > > > > > > If I had the RTL sources, then it would be more relevant to take > > > > educated guesses as to why a set of hundreds of thousands of > > > > transistors doesn't work as it should :) Without that, if it gets > > > > stuck, it gets stuck. > > > > > > > > Regards, Joonas > > > > > > > > > > Let me start by saying I do believe that questioning this shouldn't > > > prevent > > > merging the patch. > > > > > > <rant> > > > I absolutely disagree with you and think we should question these > > > kind of things > > > and get out of the mindset of, "well, it fixes a hang, let's move > > > on." > > > Understanding these kind of things is critical to writing stable > > > drivers. If > > > the programming guide/SV team said it can lead to a hang, that's > > > one thing, but > > > AFAICT, we do not understand why it is hanging nor does any of the > > > documentation > > > we do have suggest it should hang. Without clarification, next time > > > we have a > > > similar hang signature we're going to be right back here where we > > > started. > > > > > > It was one thing when there were a handful of us working on the > > > stuff and we > > > didn't have time to get to the bottom of bugs like this. I'm guilty > > > of patches > > > like this myself. I really do not see any excuse for this any more > > > though. > > > </rant> > > > > > > Could you send me the reference for where SV said it was a > > > "programming > > > restriction"? To me it all sounds very much like an implementation > > > detail, and > > > I'd like to try to understand what I am missing. > > > > Fully agree, we can't just ignore hangs but have to bottom out on > > them. > > And in the past we've had piles of examples where we discovered > > something > > and didn't chase it correctly, then a few months later massive panic > > in > > intel. At least this must be reflected in bspec. > > > > Joonas, can you please work together with Ben to chase this down? If > > you > > don't get traction please escalate the shit out of this, we really > > must > > at least get docs to accurately reflect reality. > > > > The latest reply from SV team to Arun's inquiry is that when the order > is not followed, the previous push constant values get used (which will > be all zeroes, so doesn't really make sense) and would lead to the hang > or corruption. I'm checking them if the same applies for the BTP itself > too, where the PS pointer keeps changing (and it's the only changing > thing), and could explain it better. > > Regards, Joonas Cool. It seems like an interesting conversation. I managed to convince myself corruption can happen, maybe - but not a hang. Now that you've kicked off that conversation, I'm fine with merging the patch (we've done our "due diligence" /me shudders). Reviewed-by: Ben Widawsky <ben@xxxxxxxxxxxx> > > > Thanks, Daniel -- Ben Widawsky, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx