Re: RGW encrypt is implemented by qat batch and queue mode

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 21 Sep 2022 10:20:21 -0400

On Mon, Sep 19, 2022 at 4:06 AM Feng, Hualong <hualong.feng@xxxxxxxxx> wrote:
>
> Hi Mark, Casey
>
>
>
> Could you spare some time to help review these two PRs or add them to your plan?
>
>
>
> The PR link is below:
>
> https://github.com/ceph/ceph/pull/47040
>
> https://github.com/ceph/ceph/pull/47845
>
>
>
> I reimplemented the qat encryption plugin. Since the existing RGW encryption uses 4KB as an encryption unit, the performance is poor when the qat batch interface is not used. Now I have reimplemented the encryption plug-in using the qat batch interface, which is done in two PRs. PR47040 is used to realize that when the encrypted data block is larger than 128KB, 32 pieces of 4K data are taken out for a batch submission each time. PR47845 is based on PR47040, each time the encrypted data block is smaller than 128KB, it is put into a buffer queue first, and when 32 pieces of 4K data or timeout can be reached, a batch submission is performed.
>
>
>
>
>
> The performance result is below, and moreover, the higher the CPU usage, the more obvious the effect of qat.
>
>
>
>
>
>
>
> From the flame graph, the proportion of the encryption plug-in implemented by qat in the RGWPutObj::execute function is lower than that of the encryption plug-in implemented by isal.
>
>
>
>
>
> Thanks
>
> -Hualong

hey Hualong et al, (cc dev list)

thanks for reaching out, this really helps me understand what those
PRs are trying to accomplish

in general i'm concerned about the need for threads, locking, and
buffering down in the crypto plugins. ideally this stuff would be
under the application's control. in radosgw, we've been trying to
eliminate any blocking waits on condition variables in our i/o path
now that requests are handled in coroutines - instead of blocking an
entire thread, we just suspend the coroutine and run others in the
meantime

seeing that graph by object size, my first impression was that radosgw
should be using bigger buffers. GetObj and PutObj are both reading
data in 4MB chunks, maybe we can find a way to use the qat batch
interfaces within those chunks? that could avoid the need for
cross-thread queues and synchronization. compared to your approach in
https://github.com/ceph/ceph/pull/47845, i imagine this would show
less of a benefit for small object uploads, but more of a benefit for
the big ones. do you think this could work?

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx