On Thu, Jan 27, 2011 at 3:03 AM, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Jan 26, 2011 at 11:20:22AM +0330, Hamid Nassiby wrote: > > > > Do you mean that different IP packets fit into one single Block Cipher tfm? > > Would you please explain expansively? > > We allocate one tfm per SA. So as long as ordering is guaranteed > per SA then it's guaranteed per SA which is all that's needed. > > Cheers, > -- > Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Dears, Referring to my previous posts related to a hardware AES accelerator (that is to be used to accelerate IPSec block cipher operations) driver, I would like to ask you about an possibly algorithmic problem exists in our solution. As I said earlier our driver is inspired by geode_aes driver, so assume that we have defined our supported algorithm as: static struct crypto_alg shams_cbc_alg = { .cra_name = "cbc(aes)", .cra_driver_name = "cbc-aes-mine", .cra_priority = 400, .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK, .cra_init = fallback_init_blk, .cra_exit = fallback_exit_blk, .cra_blocksize = AES_MIN_BLOCK_SIZE, .cra_ctxsize = sizeof(struct my_aes_op), .cra_alignmask = 0, .cra_type = &crypto_blkcipher_type, .cra_module = THIS_MODULE, .cra_list = LIST_HEAD_INIT(shams_cbc_alg.cra_list), .cra_u = { .blkcipher = { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MIN_KEY_SIZE, .setkey = my_setkey_blk, .encrypt = my_cbc_encrypt, .decrypt = my_cbc_decrypt, .ivsize = AES_IV_LENGTH, } } }; And our encrypt function, my_cbc_encrypt, looks like: static int my_cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { struct my_aes_op *op = crypto_blkcipher_ctx(desc->tfm); struct blkcipher_walk walk; int err, ret; unsigned long flag1, c2flag; u32 my_req_id; spin_lock_irqsave(&reqlock, c2flag); /*Our request id sent to device and then retrieved to be able to distinguish between device responses. */ my_req_id = (global_reqid++) % 63000; spin_unlock_irqrestore(&reqlock, c2flag); if (unlikely(op->keylen != AES_KEYSIZE_128)) return fallback_blk_enc(desc, dst, src, nbytes); blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt(desc, &walk); op->iv = walk.iv; while((nbytes = walk.nbytes)) { op->src = walk.src.virt.addr, op->dst = walk.dst.virt.addr; op->mode = AES_MODE_CBC; op->len = nbytes /*- (nbytes % AES_MIN_BLOCK_SIZE)*/; op->dir = AES_DIR_ENCRYPT; /* Critical PSEUDO code */ spin_lock_irqsave(&1lock, flag1); write_to_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); spin_lock_irqsave(&lock1, flag1); ret = read_from_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); /* End of Critical PSEUDO code*/ nbytes -= ret; err = blkcipher_walk_done(desc, &walk, nbytes); } return err; } As I mentioned earlier we have multiple AES engines in our hardware, so to utilize hardware as much as possible, we would like to have the possibility to give multiple requests to device and get responses from it as soon as one becomes ready. Now look at that section of my_cbc_encrypt, commented as "Critical PSEUDO code". This section gives requests to device and reads back responses (And is the damn bottleneck) . If we protect write_to_device and read_from_device call, by one pair of lock/unlock as: /* Critical PSEUDO code */ spin_lock_irqsave(&lock1, flag1); write_to_device(op, 0, my_req_id); ret = read_from_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); /* End of Critical PSEUDO code*/ then we would have no problem, system works and IPSec en/decrypts by our hardware. But ONLY one aes engine of our hardware is utilized; Good(system works), Bad (only one engine is utilized) and the Ugly (throughput is not awesome). So we must change the section to: /* Critical PSEUDO code */ spin_lock_irqsave(&lock1, flag1); write_to_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); spin_lock_irqsave(&glock, t2flag); ret = read_from_device(op, 0, my_req_id); spin_unlock_irqrestore(&glock, t2flag); /* End of Critical PSEUDO code */ and preferably to : /* Critical PSEUDO code */ /* distinct locks for write_to_device and read_from_device */ spin_lock_irqsave(&lock1, flag1); write_to_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock1, flag1); spin_lock_irqsave(&lock2, flag2); ret = read_from_device(op, 0, my_req_id); spin_unlock_irqrestore(&lock2, flag2); /* End of Critical PSEUDO*/ Here, it seems we must have no problem, but as soon as one TCP flow starts the system hangs. Finally, I request your guidelines about the problem. Thanks in advance, Hamid. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html