Re: [PATCH 0/2] Introduce the request handling for dm-crypt

Baolin Wang <baolin.wang@xxxxxxxxxx> · Fri, 13 Nov 2015 10:05:28 +0800

On 12 November 2015 at 20:59, Jan Kara <jack@xxxxxxx> wrote:
> On Thu 12-11-15 20:51:10, Baolin Wang wrote:
>> On 12 November 2015 at 20:24, Jan Kara <jack@xxxxxxx> wrote:
>> > On Thu 12-11-15 19:46:26, Baolin Wang wrote:
>> >> On 12 November 2015 at 19:06, Jan Kara <jack@xxxxxxx> wrote:
>> >> > On Thu 12-11-15 17:40:59, Baolin Wang wrote:
>> >> >> On 12 November 2015 at 17:17, Jan Kara <jack@xxxxxxx> wrote:
>> >> >> > On Thu 12-11-15 10:15:32, Baolin Wang wrote:
>> >> >> >> On 11 November 2015 at 17:48, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> >> >> >> > On Wed, Nov 11, 2015 at 05:31:43PM +0800, Baolin Wang wrote:
>> >> >> >> >> Now the dm-crypt code only implemented the 'based-bio' method to encrypt/
>> >> >> >> >> decrypt block data, which can only hanle one bio at one time. As we know,
>> >> >> >> >> one bio must use the sequential physical address and it also has a limitation
>> >> >> >> >> of length. Thus it may limit the big block encyrtion/decryption when some
>> >> >> >> >> hardware support the big block data encryption.
>> >> >> >> >>
>> >> >> >> >> This patch series introduc the 'based-request' method to handle the data
>> >> >> >> >> encryption/decryption. One request can contain multiple bios, so it can
>> >> >> >> >> handle big block data to improve the efficiency.
>> >> >> >> >
>> >> >> >> > NAK for more request based stacking or DM drivers.  They are a major
>> >> >> >> > pain to deal with, and adding more with different requirements then
>> >> >> >> > dm-multipath is not helping in actually making that one work properly.
>> >> >> >>
>> >> >> >> But now many vendors supply the hardware engine to handle the
>> >> >> >> encyrtion/decryption. The hardware really need a big block to indicate
>> >> >> >> its performance with request based things. Another thing is now the
>> >> >> >> request based things is used by many vendors (Qualcomm, Spreadtrum and
>> >> >> >> so on) to improve their performance and there's a real performance
>> >> >> >> requirement here (I can show the performance result later).
>> >> >> >
>> >> >> > So you've mentioned several times that hardware needs big blocks. How big
>> >> >> > those blocks need to be? Ideally, can you give some numbers on how the
>> >> >> > throughput of the encryption hw grows with the block size?
>> >> >>
>> >> >> It depends on the hardware design. My beaglebone black board's AES
>> >> >> engine can handle 1M at one time which is not big. As I know some
>> >> >> other AES engine can handle 16M data at one time or more.
>> >> >
>> >> > Well, one question is "can handle" and other question is how big gain in
>> >> > throughput it will bring compared to say 1M chunks. I suppose there's some
>> >> > constant overhead to issue a request to the crypto hw and by the time it is
>> >> > encrypting 1M it may be that this overhead is well amortized by the cost of
>> >> > the encryption itself which is in principle linear in the size of the
>> >> > block. That's why I'd like to get idea of the real numbers...
>> >>
>> >> Please correct me if I misunderstood your point. Let's suppose the AES
>> >> engine can handle 16M at one time. If we give the size of data is less
>> >> than 16M, the engine can handle it at one time. But if the data size
>> >> is 20M (more than 16M), the engine driver will split the data with 16M
>> >> and 4M to deal with. I can not say how many numbers, but I think the
>> >> engine is like to big chunks than small chunks which is the hardware
>> >> engine's advantage.
>> >
>> > No, I meant something different. I meant that if HW can encrypt 1M in say
>> > 1.05 ms and it can encrypt 16M in 16.05 ms, then although using 16 M blocks
>> > gives you some advantage it becomes diminishingly small.
>> >
>>
>> But if it encrypts 16M with 1M one by one, it will be much more than
>> 16.05ms (should be consider the SW submits bio one by one).
>
> Really? In my example, it would take 16.8 ms if we encrypted 16M in 1M
> chunks and 16.05 ms if done in one chunk. That is a difference for which I
> would not be willing to bend over backwards. Now these numbers are
> completely made up and that's why I wanted to see the real numbers...
>

Well, I did a simple test with dd reading, cause my engine limitation is 1M,
(1) so the time like below when handle 1M at one time.
1048576 bytes (1.0 MB) copied, 0.0841235 s, 12.5 MB/s
1048576 bytes (1.0 MB) copied, 0.0836294 s, 12.5 MB/s
1048576 bytes (1.0 MB) copied, 0.0836526 s, 12.5 MB/s

(2) These handle 64K at one time * 16 times
1048576 bytes (1.0 MB) copied, 0.0937223 s, 11.2 MB/s
1048576 bytes (1.0 MB) copied, 0.097205 s, 10.8 MB/s
1048576 bytes (1.0 MB) copied, 0.0935884 s, 11.2 MB/s

Here is a 10ms level difference, try to image if the hardware engine's
throughput is bigger than that. But like Jens said, we can measure it
by the performance data. Thanks.

>> >> >> > You mentioned that you use requests because of size limitations on bios - I
>> >> >> > had a look and current struct bio can easily describe 1MB requests (that's
>> >> >> > assuming 64-bit architecture, 4KB pages) when we have 1 page worth of
>> >> >> > struct bio_vec. Is that not enough?
>> >> >>
>> >> >> Usually one bio does not always use the full 1M, maybe some 1k/2k/8k
>> >> >> or some other small chunks. But request can combine some sequential
>> >> >> small bios to be a big block and it is better than bio at least.
>> >> >
>> >> > As Christoph mentions 4.3 should be better in submitting larger bios. Did
>> >> > you check it?
>> >>
>> >> I'm sorry I didn't check it. What's the limitation of one bio on 4.3?
>> >
>> > On 4.3 it is 1 MB (which should be enough because requests are limited to
>> > 512 KB by default anyway). Previously the maximum bio size depended on the
>> > queue parameters such as max number of segments etc.
>>
>> But it maybe not enough for HW engine which can handle maybe 10M/20M
>> at one time.
>
> Currently, you would not be able to create larger than 512K / 1M chunks
> even with request based dm-crypt since requests have limits on number of
> data they can carry as well... So this is kind of abstract discussion.
>

OK.  But I think if it is that it should change the default limitation
for the DM device.

>                                                                 Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR

-- 
Baolin.wang
Best Regards

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel