Re: Fwd: crypto accelerator driver problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Referring my previous posts in crypto list related to our hardware aes
accelerator project, I finally could deploy device in IPSec successfully. As I
mentioned earlier, my driver registers itself in kernel as blkcipher for
cbc(aes) as follows:

static struct crypto_alg my_cbc_alg = {
	.cra_name		=	"cbc(aes)",
	.cra_driver_name	=	"cbc-aes-my",
	.cra_priority		=	400,
	.cra_flags			=	CRYPTO_ALG_TYPE_BLKCIPHER |
							CRYPTO_ALG_NEED_FALLBACK,
	.cra_init			=	fallback_init_blk,
	.cra_exit			=	fallback_exit_blk,
	.cra_blocksize		=	AES_MIN_BLOCK_SIZE,
	.cra_ctxsize		=	sizeof(struct my_aes_op),
 	.cra_alignmask		=	15,
	.cra_type			=	&crypto_blkcipher_type,
	.cra_module			=	THIS_MODULE,
	.cra_list			=   LIST_HEAD_INIT(my_cbc_alg.cra_list),
	.cra_u				=	{
		.blkcipher	=	{
			.min_keysize	=	AES_MIN_KEY_SIZE,
			.max_keysize	=	AES_MIN_KEY_SIZE,
			.setkey			=	my_setkey_blk,
			.encrypt		=	my_cbc_encrypt,
			.decrypt		=	my_cbc_decrypt,
			.ivsize			=	AES_IV_LENGTH,
		}
	}
};

And my_cbc_encrypt function as PSEUDO/real code (for simplicity of
representation) is as:

static int
my_cbc_encrypt(struct blkcipher_desc *desc,
		  struct scatterlist *dst, struct scatterlist *src,
		  unsigned int nbytes)
{
		SOME__common_preparation_and_initializations;	
		
		spin_lock_irqsave(&myloc, myflags);
		send_request_to_device(&dev); /*sends request to device. After
					    processing request,device writes
					    result to destination*/
		while(!readl(complete_flag)); /*here we wait for a flag in
			  device register space indicating completion. */
		spin_unlock_irqrestore(&mylock, myflags);
	
	
}

With above code, I can successfully test IPSec gateway equipped with our
hardware and get a 200Mbps throughput using Iperf. Now I am facing with another
poblem. As I mentioned earlier, our hardware has 4 aes engines builtin. With
above code I only utilize one of them.
>From this point, we want to go a step further and utilize more than one aes
engines of our device. Simplest solution appears to me is to deploy
pcrypt/padata, made by Steffen Klassert. First instantiate in a dual
core gateway :
	modprobe tcrypt alg="pcrypt(authenc(hmac(md5),cbc(aes)))" type=3
 and test again. Running Iperf now gives me a very low
throughput about 20Mbps while dmesg shows the following:

   BUG: workqueue leaked lock or atomic: kworker/0:1/0x00000001/10
       last function: padata_parallel_worker+0x0/0x80
   Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1
   Call Trace:
    [<c03e2d7d>] ? printk+0x18/0x1b
    [<c014a2b7>] process_one_work+0x177/0x370
    [<c0199980>] ? padata_parallel_worker+0x0/0x80
    [<c014c467>] worker_thread+0x127/0x390
    [<c014c340>] ? worker_thread+0x0/0x390
    [<c014fd74>] kthread+0x74/0x80
    [<c014fd00>] ? kthread+0x0/0x80
    [<c01033f6>] kernel_thread_helper+0x6/0x10
   BUG: scheduling while atomic: kworker/0:1/10/0x00000002
   Modules linked in: pcrypt my_aes2 binfmt_misc bridge stp
bnep sco rfcomm l2cap crc16 bluetooth rfkill ppdev acpi_cpufreq mperf
cpufreq_stats cpufreq_conservative cpufreq_ondemand cpufreq_userspace
cpufreq_powersave freq_table pci_slot sbs container video output sbshc battery
iptable_filter ip_tables x_tables decnet ctr twofish_i586 twofish_generic
twofish_common camellia serpent blowfish cast5 aes_i586 aes_generic xcbc rmd160
sha512_generic sha256_generic crypto_null af_key ac lp snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_pcm_oss evdev snd_mixer_oss snd_pcm psmouse
serio_raw snd_seq_dummy pcspkr parport_pc parport snd_seq_oss snd_seq_midi
snd_rawmidi snd_seq_midi_event option usb_wwan snd_seq usbserial snd_timer
snd_seq_device button processor iTCO_wdt iTCO_vendor_support snd intel_agp
soundcore intel_gtt snd_page_alloc agpgart shpchp pci_hotplug ext3 jbd mbcache
sr_mod cdrom sd_mod sg ata_generic pata_jmicron ata_piix pata_acpi libata floppy
r8169 mii
  scsi_mod uhci_hcd ehci_hcd usbcore thermal fan fuse
   Pid: 10, comm: kworker/0:1 Not tainted 2.6.37 #1
   Call Trace:
    [<c012d459>] __schedule_bug+0x59/0x70
    [<c03e3757>] schedule+0x6a7/0xa70
    [<c0105bf7>] ? show_trace_log_lvl+0x47/0x60
    [<c03e2be9>] ? dump_stack+0x6e/0x75
    [<c014a308>] ? process_one_work+0x1c8/0x370
    [<c0199980>] ? padata_parallel_worker+0x0/0x80
    [<c014c51f>] worker_thread+0x1df/0x390
    [<c014c340>] ? worker_thread+0x0/0x390
    [<c014fd74>] kthread+0x74/0x80
    [<c014fd00>] ? kthread+0x0/0x80
    [<c01033f6>] kernel_thread_helper+0x6/0x10

I must emphasize again that goal of deploying pcrypt/padata is to have more than
one request present in our hardware (e.g. in a quad cpu system we'll have 4
encryption and 4 decryption requests sent into our hardware). Also I tried using
pcrypt/padata in a single cpu system with one change in pcrypt_init_padata
function of pcrypt.c: passing 4 as max_active parameter of alloc_workqueue.
In fact I called alloc_workqueue as:

alloc_workqueue(name, WQ_MEM_RECLAIM | WQ_CPU_INTENSIVE, 4);
instead of :
alloc_workqueue(name, WQ_MEM_RECLAIM | WQ_CPU_INTENSIVE, 1);

But this did not give me 4 encryption requests.
I know that one promising solution might be to choose ablkcipher over blkcipher
scheme, but as we need a quicker solution and we are pressed with
time, I request
 your comments about my problem.
Can I solve my problem with pcrypt/padata anyway with any change in my current
 blkcipher driver en/deccrypt function or in pcrypt iself? Or should I
take another way?

Please take in mind that minor changes to our current solution is highly
recommended because of our little time.

Thanks in advance,

Hamid.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux