Re: Bug in processing dependencies by async_tx_submit() ?

Yuri Tikhonov <yur@xxxxxxxxxxx> · Fri, 2 Nov 2007 11:13:57 +0300

 Hi Dan,

On Friday 02 November 2007 03:36, Dan Williams wrote:
> >   This is happened because of the specific implementation of
> >  dma_wait_for_async_tx().
> 
> So I take it you are not implementing interrupt based callbacks in your 
driver?

 Why not ? I have interrupt based callbacks in my driver. An INTERRUPT 
descriptor, implemented for both (COPY and XOR) channels, does the callback 
upon its completion.

 Here is an example where your implementation of dma_wait_for_async_tx() will 
not work as expected. Let's we have OP1 <--depends on-- OP2 <--depends on-- 
OP3, where

 OP1: cookie = -EBUSY, channel = DMA0; <- not submitted
 OP2: cookie = -EBUSY, channel = DMA0; <- not submitted
 OP3: cookie = 101, channel = DMA1; <- submitted, but not linked to h/w

 where cookie == 101 is some valid, positive cookie; and this fact means that 
OP3 *was submitted* to the DMA1 channel but *perhaps was not linked* to the 
h/w chain, for example, because the threshold for DMA1 was not achieved yet.

 With your implementation of dma_wait_for_async_tx() we do dma_sync_wait(OP2). 
And I propose to do dma_sync_wait(OP3), because in your case we may never 
wait for OP2 completion since dma_sync_wait() flushes to h/w the chains of 
DMA0, but OP3 in DMA1 remains unlinked to h/w and it blocks all the chain of 
dependencies.

> >   The "iter", we finally waiting for there, corresponds to the last 
allocated
> >  but not-yet-submitted descriptor. But if the "iter" we are waiting for is
> >  dependent from another descriptor which has cookie > 0, but is not yet
> >  submitted to the h/w channel because of the fact that threshold is not
> >  achieved to this moment, then we may wait in dma_wait_for_async_tx()
> >  infinitely. I think that it makes more sense to get the first descriptor
> >  which was submitted to the channel but probably is not put into the h/w
> >  chain, i.e. with cookie > 0 and do dma_sync_wait() of this descriptor.
> >
> >   When I modified the dma_wait_for_async_tx() in such way, then the kernel
> >  locking had disappeared. But nevertheless the mkfs processes hangs-up 
after
> >  some time. So, it looks like something is still missing in support of the
> >  chaining dependencies feature...
> >
> 
> I am preparing a new patch that replaces ASYNC_TX_DEP_ACK with
> ASYNC_TX_CHAIN_ACK.  The plan is to make the entire chain of
> dependencies available up until the last transaction is submitted.
> This allows the entire dependency chain to be walked at
> async_tx_submit time so that we can properly handle these multiple
> dependency cases.  I'll send it out when it passes my internal
> tests...

 Fine. I guess this replacement assumes some modifications to the RAID-5 
driver as well. Right?

-- 
Yuri Tikhonov, Senior Software Engineer
Emcraft Systems, www.emcraft.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html