On Fri, 2005-05-20 at 09:49 +0100, Dan Searle wrote: > Hi, > > Please see my comments inline below... > > Thursday, May 19, 2005, 3:12:24 PM, you wrote: > > > On Thu, 2005-05-19 at 14:47 +0100, Dan Searle wrote: > >> Hi, > >> > >> The problem with the existing IPsec stack and using OCF is that, for > >> some reason, a scatter gather list is used to split the packet data > >> into cypher block size chunks before sending them to the crypto API. > >> This is very bad for the IXP425 hardware crypto accelerator which > >> works much faster if you send it the entire plain text in one go > >> rather than dispatching the cypher block size chunks each in a > >> separate crypto context. > >> > >> For instance, I wrote a hardware crypto module which integrated with > >> OCF and the IPsec stack, but because the chunks of plain text were > >> dispatched to the hardware crypto engine in small (cypher block size) > >> chunks, the throughput was only about 1Mbit/sec!!!! > >> > >> I have recently thrown out the part of the IPsec stack which splits > >> the payloads using the scatter gather kernel functions, so that it > >> dispatches the entire plain text payload in one chunk. I.e. I dispatch > >> the entire packet to the IXP425 in one go. Using this method I'm now > >> getting about 20Mbit/sec throughput. > > > Synchronous crypto stack splits original data to the small chunks, > > unfortunately. > > Although Herbert Xu moved away from it in the latest patches for crypto > > tree, > > but it is usefull for "synchronous" crypto devices like VIA/freescale > > processors. > > You definitely should not adopt your driver for IXP to the synchronous > > API, > > concider using OCF or acrypto directly from esp/ah output functions > > [actually I plan to release some code for it this weekend for acrypto > > and ESP]. > > I want to avoid adding the extra layer of complexity that OCF brings, > I want to call the IXP400 access libs crypto API directly from the > IPsec stack. It's your decision, although it has some problems - it can not be used with outher crypto devices and/or SW implemented algos. > We already have the ESP output functions calling the IXP400 crypto API > directly, however, the problem is that... By the time the ESP output > function returns, doesn't the IPsec stack need the SKB to be finished > with? I.e. I thought that we had to "busy wait" in a loop within the > ESP output function until the crpyto API calls us back with the cypher > text, so that, by the time we exit the ESP output function, we have a > crypted SKB. > > Perhapse I'm missing something. Can we exit from the ESP output > function before the SKB is actually crypted? How can we lock a Yes we can. > particular SKB until we are called back by the IXP400 crypto API > telling us the crypto is complete? I will describe it inline below. > >> This is still far from optimal, because of the context I'm in and the > >> way I'm performing the crypto dispatch, it means I have to sit the > >> kernel in a busy loop until the IXP425 calls me back with the cypher > >> text. > >> > >> What I'm now trying to do is split the stack somehow, so that one > >> thread puts plain text skb's into the IXP425 dispatcher, and a new > >> thread sits on the other side and pulls the encrypted skb's out when > >> they are ready. Eliminating the need for a busy loop in a single > >> thread. Although, I'm not at all sure how to do this. > > > You do not need to busywait until your crypto driver finishes the work. > > Concider link I sent before: there is dst entry tricky split > > which allows [thanks to it's stackability] asynchronous processing. > > I.e. > > skb_clone(); > > setup crypto > > return 0; // here network stack thinks that skb is processed and queued > > to be sent to devices xmit fucntion. > > > but since we cloned skb, we may return to it's processing later, > > for example from crypto finish callback. > > > I.e. > > callback() > > { > > skb = some_priv_data; > > > setup_new_dst_entry; > > dst_output(skb); > > } > > Oops, looks like I should learn to read all of an email before > replying to one! However, I've just looked at the ESP output function > again, and I don't see it calling dst_output(). Perhaps I'm being > very dim and missed something here, but I can see exactly what you > mean, thanks for the pointer! > > > It has disadvantage that there is no ability to inform original > > caller (xfrm) that skb was not processed due to some error. > > This is not likely to happen unless we run into an OOM error, in which > case a few dropped frames are not your biggest problem anyway! Here is description of asynchronous IPsec processing. This patch I sent couple of weeks ago to netdev@ with benchmark. Links were provided in previous e-mails. diff -ru ../linux-2.6-orig/net/ipv4/esp4.c ./net/ipv4/esp4.c --- ../linux-2.6-orig/net/ipv4/esp4.c 2005-04-25 15:41:39.000000000 +0400 +++ ./net/ipv4/esp4.c 2005-04-29 14:34:10.000000000 +0400 @@ -7,6 +7,7 @@ #include <linux/crypto.h> #include <linux/pfkeyv2.h> #include <linux/random.h> +#include <linux/timer.h> #include <net/icmp.h> #include <net/udp.h> @@ -17,6 +18,95 @@ __u8 proto; }; +static int esp_output(struct xfrm_state *x, struct sk_buff *skb); + +struct esp_async { + struct timer_list tm; + struct sk_buff *skb; + struct xfrm_state *x; + struct dst_entry *dst; +}; + +static void esp4_callback(unsigned long data) +{ + struct esp_async *ea = (struct esp_async *)data; + struct sk_buff *skb = ea->skb; + struct dst_entry *dst = ea->dst; + struct xfrm_state *x = ea->x; + int err; + + printk("%s: skb=%p, skb->users=%d.\n", __func__, skb, atomic_read(&skb->users)); + printk("%s: dst=%p, skb->dst=%p.\n", __func__, dst, skb->dst); + printk("%s: xfrm=%p, skb->dst->xfrm=%p.\n", __func__, x, (skb->dst)?skb->dst->xfrm:NULL); + + spin_lock_bh(&x->lock); + err = esp_output(x, skb); + spin_unlock_bh(&x->lock); + + printk("%s: Data has been processed: err=%d.\n", __func__, err); + + if (err) + goto err_out; + + skb->dst = dst_pop(dst); + printk("%s: pop has been finished: skb->dst=%p, dst=%p, skb->users=%d.\n", + __func__, skb->dst, dst, atomic_read(&skb->users)); + if (!skb->dst) + goto err_out; + + dst_output(skb); + +out: + kfree(ea); + return; + +err_out: + kfree_skb(skb); + goto out; +} This is a callback - it process' skb and sets new skb->dst entry and calls dst_output() - new dst entry is set to one pointed to ip_ouput(), and dst_output() will call it. This is exact behaviour of network stack without interruption. This callback can be called from acrypto/OCF/your driver with encrypted skb->data and you only need to setup dst entries and call dst_output(). +static int esp_output_async(struct xfrm_state *x, struct sk_buff *skb) +{ + struct esp_async *ea; + struct dst_entry *child; + + printk("%s: enter. Child list: ", __func__); + for (child = skb->dst; child; child = child->child) + printk("%p [%s] [%d] -> ", child, child->dev->name, atomic_read(&child->__refcnt)); + printk("\n"); + + ea = kmalloc(sizeof(*ea), GFP_ATOMIC); + if (!ea) + return -ENOMEM; + + memset(ea, 0, sizeof(*ea)); + + skb = skb_clone(skb, GFP_ATOMIC); + if (!skb) + return -ENOMEM; + dst_hold(skb->dst); + + ea->skb = skb; + ea->x = x; + ea->dst = skb->dst; + + printk("%s: x=%p, skb=%p, skb->dst=%p, skb->dst->xfrm=%p.\n", + __func__, x, skb, skb->dst, skb->dst->xfrm); + + init_timer(&ea->tm); + ea->tm.function = &esp4_callback; + ea->tm.data = (unsigned long)ea; + ea->tm.expires = jiffies; + + add_timer(&ea->tm); + + printk("%s: timer added: skb->users=%d.\n", __func__, atomic_read(&skb->users)); + + return 0; + +} This is an method which will be called from xfrm->output method. xfrm->output() is called when (for example)tcp_sendmsg() ends up in dst_output(). Next in dst_entry stack is ip_output(), which returns 0 - that indicate that skb is queued to be sent to NIC driver. Normal (synchronous) xfrm output method must return NET_XMIT_BYPASS, it will indicate that this dst entry has been processed, and it's time to call next one in dst stack [ip_output()]. What I do here is just faking the stack and do not allow dst_output() to call ip_output() after xfrm->output() is called. Since I clone skb, it's data will not be freed by stack. So in a timer callback [what you want here is to setup either acrypto, or OCF, or your own IXP processing engine] this skb will be reinjected into the stack. + + static int esp_output(struct xfrm_state *x, struct sk_buff *skb) { int err; @@ -465,7 +555,7 @@ .get_max_size = esp4_get_max_size, .input = esp_input, .post_input = esp_post_input, - .output = esp_output + .output = esp_output_async }; static struct net_protocol esp4_protocol = { diff -ru ../linux-2.6-orig/net/ipv4/xfrm4_output.c ./net/ipv4/xfrm4_output.c --- ../linux-2.6-orig/net/ipv4/xfrm4_output.c 2005-04-25 15:41:40.000000000 +0400 +++ ./net/ipv4/xfrm4_output.c 2005-04-29 12:13:41.000000000 +0400 @@ -124,12 +124,6 @@ x->curlft.packets++; spin_unlock_bh(&x->lock); - - if (!(skb->dst = dst_pop(dst))) { - err = -EHOSTUNREACH; - goto error_nolock; - } - err = NET_XMIT_BYPASS; out_exit: return err; Here is a trick - since we return 0 here(value from esp_output_async()) instead of NET_XMIT_BYPASS, network stack will not process next dst entry(ip_output()) and will "forget" about this skb. Later we must either free it or call remained dst entries using dst_output(). > Thanks again, Dan.... Hope this helps. -- Evgeniy Polyakov Crash is better than data corruption -- Arthur Grabowski
Attachment:
signature.asc
Description: This is a digitally signed message part