On 3/18/2015 12:03 AM, Kim Phillips wrote: > On Tue, 17 Mar 2015 19:58:55 +0200 > Horia Geantă <horia.geanta@xxxxxxxxxxxxx> wrote: > >> On 3/17/2015 2:19 AM, Kim Phillips wrote: >>> On Mon, 16 Mar 2015 12:02:51 +0200 >>> Horia Geantă <horia.geanta@xxxxxxxxxxxxx> wrote: >>> >>>> On 3/4/2015 2:23 AM, Kim Phillips wrote: >>>>> Only potential problem is getting the crypto API to set the GFP_DMA >>>>> flag in the allocation request, but presumably a >>>>> CRYPTO_TFM_REQ_DMA crt_flag can be made to handle that. >>>> >>>> Seems there are quite a few places that do not use the >>>> {aead,ablkcipher_ahash}_request_alloc() API to allocate crypto requests. >>>> Among them, IPsec and dm-crypt. >>>> I've looked at the code and I don't think it can be converted to use >>>> crypto API. >>> >>> why not? >> >> It would imply having 2 memory allocations, one for crypto request and >> the other for the rest of the data bundled with the request (for IPsec >> that would be ESN + space for IV + sg entries for authenticated-only >> data and sk_buff extension, if needed). >> >> Trying to have a single allocation by making ESN, IV etc. part of the >> request private context requires modifying tfm.reqsize on the fly. >> This won't work without adding some kind of locking for the tfm. > > can't a common minimum tfm.reqsize be co-established up front, at > least for the fast path? Indeed, for IPsec at tfm allocation time - esp_init_state() - tfm.reqsize could be increased to account for what is known for a given flow: ESN, IV and asg (S/G entries for authenticated-only data). The layout would be: aead request (fixed part) private ctx of backend algorithm seq_no_hi (if ESN) IV asg sg <-- S/G table for skb_to_sgvec; how many entries is the question Do you have a suggestion for how many S/G entries to preallocate for representing the sk_buff data to be encrypted? An ancient esp4.c used ESP_NUM_FAST_SG, set to 4. Btw, currently maximum number of fragments supported by the net stack (MAX_SKB_FRAGS) is 16 or more. >>>> This means that the CRYPTO_TFM_REQ_DMA would be visible to all of these >>>> places. Some of the maintainers do not agree, as you've seen. >>> >>> would modifying the crypto API to either have a different >>> *_request_alloc() API, and/or adding calls to negotiate the GFP mask >>> between crypto users and drivers, e.g., get/set_gfp_mask, work? >> >> I think what DaveM asked for was the change to be transparent. >> >> Besides converting to *_request_alloc(), seems that all other options >> require some extra awareness from the user. >> Could you elaborate on the idea above? > > was merely suggesting communicating GFP flags anonymously across the > API, i.e., GFP_DMA wouldn't appear in user code. Meaning user would have to get_gfp_mask before allocating a crypto request - i.e. instead of kmalloc(..., GFP_ATOMIC) to have kmalloc(GFP_ATOMIC | get_gfp_mask(aead))? >>>> An alternative would be for talitos to use the page allocator to get 1 / >>>> 2 pages at probe time (4 channels x 32 entries/channel x 64B/descriptor >>>> = 8 kB), dma_map_page the area and manage it internally for talitos_desc >>>> hw descriptors. >>>> What do you think? >>> >>> There's a comment in esp_alloc_tmp(): "Use spare space in skb for >>> this where possible," which is ideally where we'd want to be (esp. >> >> Ok, I'll check that. But note the "where possible" - finding room in the >> skb to avoid the allocation won't always be the case, and then we're >> back to square one. So the skb cb is out of the question, being too small (48B). Any idea what was the intention of the "TODO" - maybe to use the tailroom in the skb data area? >>> because that memory could already be DMA-able). Your above >>> suggestion would be in the opposite direction of that. >> >> The proposal: >> -removes dma (un)mapping on the fast path > > sure, but at the expense of additional complexity. Right, there's no free lunch. But it's cheaper. >> -avoids requesting dma mappable memory for more than it's actually >> needed (CRYPTO_TFM_REQ_DMA forces entire request to be mappable, not >> only its private context) > > compared to the payload? Plus, we have plenty of DMA space these > days. > >> -for caam it has the added benefit of speeding the below search for the >> offending descriptor in the SW ring from O(n) to O(1): >> for (i = 0; CIRC_CNT(head, tail + i, JOBR_DEPTH) >= 1; i++) { >> sw_idx = (tail + i) & (JOBR_DEPTH - 1); >> >> if (jrp->outring[hw_idx].desc == >> jrp->entinfo[sw_idx].desc_addr_dma) >> break; /* found */ >> } >> (drivers/crypto/caam/jr.c - caam_dequeue) > > how? The job ring h/w will still be spitting things out > out-of-order. jrp->outring[hw_idx].desc bus address can be used to find the sw_idx in O(1): dma_addr_t desc_base = dma_map_page(alloc_page(GFP_DMA),...); [...] sw_idx = (desc_base - jrp->outring[hw_idx].desc) / JD_SIZE; JD_SIZE would be 16 words (64B) - 13 words used for the h/w job descriptor, 3 words can be used for smth. else. Basically all JDs would be filled at a 64B-aligned offset in the memory page. > Plus, like I said, it's taking the problem in the wrong direction: > we need to strive to merge the allocation and mapping with the upper > layers as much as possible. IMHO propagating the GFP_DMA from backend crypto implementations to crypto API users doesn't seem feasable. It's error-prone to audit all places that allocate crypto requests w/out using *_request_alloc API. And even if all these places would be identified: -in some cases there's some heavy rework involved -more places might show up in the future and there's no way to detect them Horia -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html