Re[4]: Re-writing the 2.6.11.8 Kernel IPsec stack for hardware crypto offload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thanks, the async esp patch looks like exactly what I want! However,
I can see a potential problem still.

Lets assume I have this async esp output function implemented, making
calls directly to the IXP400 access libs crypto dispatch function.
Packets start coming in and we call the crypto dispatch function which
queues the requests inside the IXP400 libs and the hardware somewhere.
Now, lets say the crypto engine can crypt at 50Mbit/sec, this means we
are being called back with crypted packets at a rate of 50Mbit/sec,
which is great. However, lets say the packets are coming in at a
higher rate, say 60Mbit/sec. After a short while the IXP400 queue will
fill up and our calls to the crypto dispatch function will cause an
error, and we will have to either a) drop the frame, or b) busy wait
until the IXP400 queue is no longer full and we can dispatch again.

Dropping frames is not a very good solution, busy waiting isn't either
as this is exactly what I'm trying to avoid. So, what I need is to
tell the Kernel NET stack to stop sending packets to the esp input/output
functions via some kind of flow-control "XOFF" call.

I know there are similar flow control functions for netdev drivers,
whereby the driver can ask the kernel stack to stop transmitting
frames to it, then restart sending at a later time. Are there similar
flow control functions for this part of the NET stack?

Many thanks again, regards, Dan...

Friday, May 20, 2005, 10:15:20 AM, you wrote:

> On Fri, 2005-05-20 at 09:49 +0100, Dan Searle wrote:
>> Hi,
>> 
>> Please see my comments inline below...
>> 
>> Thursday, May 19, 2005, 3:12:24 PM, you wrote:
>> 
>> > On Thu, 2005-05-19 at 14:47 +0100, Dan Searle wrote:
>> >> Hi,
>> >> 
>> >> The problem with the existing IPsec stack and using OCF is that, for
>> >> some reason, a scatter gather list is used to split the packet data
>> >> into cypher block size chunks before sending them to the crypto API.
>> >> This is very bad for the IXP425 hardware crypto accelerator which
>> >> works much faster if you send it the entire plain text in one go
>> >> rather than dispatching the cypher block size chunks each in a
>> >> separate crypto context.
>> >> 
>> >> For instance, I wrote a hardware crypto module which integrated with
>> >> OCF and the IPsec stack, but because the chunks of plain text were
>> >> dispatched to the hardware crypto engine in small (cypher block size)
>> >> chunks, the throughput was only about 1Mbit/sec!!!!
>> >> 
>> >> I have recently thrown out the part of the IPsec stack which splits
>> >> the payloads using the scatter gather kernel functions, so that it
>> >> dispatches the entire plain text payload in one chunk. I.e. I dispatch
>> >> the entire packet to the IXP425 in one go. Using this method I'm now
>> >> getting about 20Mbit/sec throughput.
>> 
>> > Synchronous crypto stack splits original data to the small chunks,
>> > unfortunately.
>> > Although Herbert Xu moved away from it in the latest patches for crypto
>> > tree,
>> > but it is usefull for "synchronous" crypto devices like VIA/freescale
>> > processors.
>> > You definitely should not adopt your driver for IXP to the synchronous
>> > API,
>> > concider using OCF or acrypto directly from esp/ah output functions
>> > [actually I plan to release some code for it this weekend for acrypto
>> > and ESP].
>> 
>> I want to avoid adding the extra layer of complexity that OCF brings,
>> I want to call the IXP400 access libs crypto API directly from the
>> IPsec stack.

> It's your decision, although it has some problems - it can not be used
> with outher crypto devices and/or SW implemented algos.

>> We already have the ESP output functions calling the IXP400 crypto API
>> directly, however, the problem is that... By the time the ESP output
>> function returns, doesn't the IPsec stack need the SKB to be finished
>> with? I.e. I thought that we had to "busy wait" in a loop within the
>> ESP output function until the crpyto API calls us back with the cypher
>> text, so that, by the time we exit the ESP output function, we have a
>> crypted SKB.
>> 
>> Perhapse I'm missing something. Can we exit from the ESP output
>> function before the SKB is actually crypted? How can we lock a

> Yes we can.

>> particular SKB until we are called back by the IXP400 crypto API
>> telling us the crypto is complete?

> I will describe it inline below.

>> >> This is still far from optimal, because of the context I'm in and the
>> >> way I'm performing the crypto dispatch, it means I have to sit the
>> >> kernel in a busy loop until the IXP425 calls me back with the cypher
>> >> text.
>> >> 
>> >> What I'm now trying to do is split the stack somehow, so that one
>> >> thread puts plain text skb's into the IXP425 dispatcher, and a new
>> >> thread sits on the other side and pulls the encrypted skb's out when
>> >> they are ready. Eliminating the need for a busy loop in a single
>> >> thread. Although, I'm not at all sure how to do this.
>> 
>> > You do not need to busywait until your crypto driver finishes the work.
>> > Concider link I sent before: there is dst entry tricky split
>> > which allows [thanks to it's stackability] asynchronous processing.
>> > I.e.
>> > skb_clone();
>> > setup crypto
>> > return 0; // here network stack thinks that skb is processed and queued
>> > to be sent to devices xmit fucntion.
>> 
>> > but since we cloned skb, we may return to it's processing later, 
>> > for example from crypto finish callback.
>> 
>> > I.e.
>> > callback()
>> > {
>> >   skb = some_priv_data;
>> 
>> >   setup_new_dst_entry;
>> >   dst_output(skb);
>> > }
>> 
>> Oops, looks like I should learn to read all of an email before
>> replying to one! However, I've just looked at the ESP output function
>> again, and I don't see it calling dst_output(). Perhaps I'm being
>> very dim and missed something here, but I can see exactly what you
>> mean, thanks for the pointer!
>> 
>> > It has disadvantage that there is no ability to inform original
>> > caller (xfrm) that skb was not processed due to some error.
>> 
>> This is not likely to happen unless we run into an OOM error, in which
>> case a few dropped frames are not your biggest problem anyway!

> Here is description of asynchronous IPsec processing.
> This patch I sent couple of weeks ago to netdev@ with benchmark.
> Links were provided in previous e-mails.

[snip]

> This is a callback - it process' skb and sets new skb->dst entry and calls dst_output() -
> new dst entry is set to one pointed to ip_ouput(), and dst_output()
> will call it. This is exact behaviour of network stack without interruption.
> This callback can be called from acrypto/OCF/your driver with encrypted skb->data
> and you only need to setup dst entries and call dst_output().

[snip]

> Here is a trick - since we return 0 here(value from esp_output_async())
> instead of NET_XMIT_BYPASS, network stack will not process next dst entry(ip_output())
> and will "forget" about this skb.
> Later we must either free it or call remained dst entries using dst_output().


>> Thanks again, Dan....

> Hope this helps.



--

Dan Searle
Adelix Ltd
dan.searle@xxxxxxxxxx web: www.adelix.com
tel: 0845 230 9590 / fax: 0845 230 9591 / support: 0845 230 9592
snail: The Old Post Office, Bristol Rd, Hambrook, Bristol BS16 1RY. UK.

Any views expressed in this email communication are those
of the individual sender, except where the sender specifically states
them to be the views of a member of Adelix Ltd.  Adelix Ltd. does not
represent, warrant or guarantee that the integrity of this communication
has been maintained nor that the communication is free of errors or
interference.

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux