Re: [PATCH v1 6/8] dmaengine: enhance network subsystem to support DMA device hotplug

Jiang Liu <liuj97@xxxxxxxxx> · Wed, 25 Apr 2012 23:47:19 +0800

Hi Dan,
	Thanks for your great comments about the performance penalty issue. And I'm trying
to refine the implementation to reduce penalty caused by hotplug logic. If the algorithm works
correctly, the optimized hot path code will be:

------------------------------------------------------------------------------
struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
{
        struct dma_chan *chan = this_cpu_read(channel_table[tx_type]->chan);

        this_cpu_inc(dmaengine_chan_ref_count);
        if (static_key_false(&dmaengine_quiesce)) {
                chan = NULL;
        }

        return chan;
}
EXPORT_SYMBOL(dma_find_channel);

struct dma_chan *dma_get_channel(struct dma_chan *chan)
{
        if (static_key_false(&dmaengine_quiesce))
                atomic_inc(&dmaengine_dirty);
        this_cpu_inc(dmaengine_chan_ref_count);

        return chan;
}
EXPORT_SYMBOL(dma_get_channel);

void dma_put_channel(struct dma_chan *chan)
{
        this_cpu_dec(dmaengine_chan_ref_count);
}
EXPORT_SYMBOL(dma_put_channel);
-----------------------------------------------------------------------------

The disassembled code is:
(gdb) disassemble dma_find_channel 
Dump of assembler code for function dma_find_channel:
   0x0000000000000000 <+0>:	push   %rbp
   0x0000000000000001 <+1>:	mov    %rsp,%rbp
   0x0000000000000004 <+4>:	callq  0x9 <dma_find_channel+9>
   0x0000000000000009 <+9>:	mov    %edi,%edi
   0x000000000000000b <+11>:	mov    0x0(,%rdi,8),%rax
   0x0000000000000013 <+19>:	mov    %gs:(%rax),%rax
   0x0000000000000017 <+23>:	incq   %gs:0x0				//overhead: this_cpu_inc(dmaengine_chan_ref_count)
   0x0000000000000020 <+32>:	jmpq   0x25 <dma_find_channel+37>	//overhead: if (static_key_false(&dmaengine_quiesce)), will be replaced as NOP by jump label
   0x0000000000000025 <+37>:	pop    %rbp
   0x0000000000000026 <+38>:	retq   
   0x0000000000000027 <+39>:	nopw   0x0(%rax,%rax,1)
   0x0000000000000030 <+48>:	xor    %eax,%eax
   0x0000000000000032 <+50>:	pop    %rbp
   0x0000000000000033 <+51>:	retq   
End of assembler dump.
(gdb) disassemble dma_put_channel 	// overhead: to decrease channel reference count, 6 instructions
Dump of assembler code for function dma_put_channel:
   0x0000000000000070 <+0>:	push   %rbp
   0x0000000000000071 <+1>:	mov    %rsp,%rbp
   0x0000000000000074 <+4>:	callq  0x79 <dma_put_channel+9>
   0x0000000000000079 <+9>:	decq   %gs:0x0
   0x0000000000000082 <+18>:	pop    %rbp
   0x0000000000000083 <+19>:	retq   
End of assembler dump.
(gdb) disassemble dma_get_channel 
Dump of assembler code for function dma_get_channel:
   0x0000000000000040 <+0>:	push   %rbp
   0x0000000000000041 <+1>:	mov    %rsp,%rbp
   0x0000000000000044 <+4>:	callq  0x49 <dma_get_channel+9>
   0x0000000000000049 <+9>:	mov    %rdi,%rax
   0x000000000000004c <+12>:	jmpq   0x51 <dma_get_channel+17>
   0x0000000000000051 <+17>:	incq   %gs:0x0
   0x000000000000005a <+26>:	pop    %rbp
   0x000000000000005b <+27>:	retq   
   0x000000000000005c <+28>:	nopl   0x0(%rax)
   0x0000000000000060 <+32>:	lock incl 0x0(%rip)        # 0x67 <dma_get_channel+39>
   0x0000000000000067 <+39>:	jmp    0x51 <dma_get_channel+17>
End of assembler dump.

So for a typical dma_find_channel()/dma_put_channel(), the total overhead
is about 10 instructions and two percpu(local) memory updates. And there's
no shared cache pollution any more. Is this acceptable ff the algorithm 
works as expected? I will test the code tomorrow.

For typical systems which don't support DMA device hotplug, the overhead
could be completely removed by condition compilation.

Any comments are welcomed!

Thanks!
--gerry

On 04/24/2012 11:09 AM, Dan Williams wrote:
>>> If you are going to hotplug the entire IOH, then you are probably ok

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html