Re: Include request for reset-rework branch.

Christian König <deathsimple@xxxxxxxxxxx> · Tue, 01 May 2012 15:38:07 +0200

On 30.04.2012 18:26, Jerome Glisse wrote:
On Mon, Apr 30, 2012 at 11:37 AM, Christian König
<deathsimple@xxxxxxxxxxx>  wrote:
On 30.04.2012 17:12, Jerome Glisse wrote:
On Mon, Apr 30, 2012 at 11:12 AM, Jerome Glisse<j.glisse@xxxxxxxxx>
  wrote:
On Mon, Apr 30, 2012 at 10:50 AM, Christian König
<deathsimple@xxxxxxxxxxx>    wrote:
Hi Dave,

if nobody has a last moment concern please include the following patches
in drm-next.

Except for some minor fixes they have already been on the list for quite
some time,
but I intentional left out the debugfs related patches cause we haven't
finished the
discussion about them yet.

If you prefer to merge them directly, I also made them available as
reset-rework
branch here: git://people.freedesktop.org/~deathsimple/linux

Cheers,
Christian.

I am not completely ok, i am against patch 5. I need sometime to review
it.

Cheers,
Jerome
Sorr mean patch 7
I just started to wonder :) what's wrong with patch 7?

Please keep in mind that implementing proper locking in the lower level
objects allows us to remove quite some locking in the upper layers.

By the way, do you mind when I dig into the whole locking stuff of the
kernel driver a bit more? There seems to be allot of possibilities to
cleanup and simplify the overall driver.

Cheers,
Christian.
Well when it comes to lock, i have some kind of an idea i wanted to
work for a while. GPU is transaction based in a sense, we feed rings
and it works on them, 90% of the work is cs ioctl so we should be able
to have one and only one lock (ignoring the whole modesetting path
here for which i believe one lock too is enough). Things like PM would
need to take all lock but hey i believe that's a good thing as my
understanding is PM we do right now need most of the GPU idle.

So here is what we have
ih spinlock (can be ignored ie left as is)
irq spinlock (can be ignored ie left as is)
blit mutex
pm mutex
cs mutex
dc_hw_i2c mutex
vram mutex
ib mutex
rings[] lock

So the real issue is ttm calling back into the driver. So the idea i
had is to have a work thread that is the only one allowed to mess with
the GPU. The work thread would use some locking in which it has
preference over the writer (writer being cs ioctl or ttm call back or
anything else that needs to schedule GPU work). This would require
only two lock. I am actually thinking of having 2 list one where
writer add there "transaction" and one where the worker empty the
transaction, something like:

cs_ioct
{
sa_alloc_for_trans
....
lock(trans_lock)
list_add_tail(trans, trans_temp_list)
unlock(trans_lock)
}

workeur
{
lock(trans_lock)
list_splice_tail(trans_list, trans_temp_list)
unlock(trans_lock)
// schedule ib
....
// process fence&  semaphore
}

So there would be one transaction lock and one lock for the
transaction memory allocation (ib, semaphore and alike) The workeur
would also be responsible for GPU reset, there wouldn't be any ring
lock as i believe one worker is enough for all rings.

For transaction the only issue really is ttm, cs is easy it's an
ib+fence+semaphore, ttm can be more complex, it can be bo move, bind,
unbind, ... Anyway it's doable and it's design i had in mind for a
while.
Well, that sounds like the direction to go, but I would like to even 
avoid the submission thread, since the GPU is working on the rings 
asynchronously anyway.

The locking problems we are currently seeing are more a result of 
abusing global variables for local data. or locking to much code with to 
few primitives, instead of just having to much locking primitives over 
all. For example, I really can't see why we have a blit mutex for the 
r600 shader blit code? Also retrospective having one mutex per ring does 
sound a bit superfluous.

To sum my ideas up:

1. I suggest that memory management is self containing, e.g. you can 
request small amounts of memory for IBs, fences, semaphores, blitting 
vertex buffer etc.. without worrying about others doing that at the same 
time as you. That really sounds like your "lock for the transaction 
memory allocation", and I'm pretty sure that I've extended the SA so far 
to play that role pretty well.

2. Have exactly ONE ring submission mutex. This mutex is taken right 
before an job (and it shouldn't matter if that's a ttm move/blit or an 
IB) is pushed unto a ring. And it is strictly not allowed to allocate 
more SA memory, call into TTM etc.. while this mutex is held. Everything 
that's necessary to submit a job must happen before grabbing it and 
stored in thread local memory, e.g. on the stack or a kmalloc allocated 
patch of memory.

3. Protect data, not code! I.e. have locks to protect one data 
structures and not just say: Those two things shouldn't happen at the 
same time acquire that and this lock.

Over all it sounds like we are playing with similar ideas, I would 
suggest that I just hack together some patches, probably starting with 
the ring submission and r600 blit mutex and then we take a look again 
where this leads us.

Cheers,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel