Re: rgw: streaming interfaces for object read/write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thanks for the info (and context)!

the first set of usecases I would like to tackle are actually non-symmetric ones (unlike encryption and compression).
for example:
* a client is reading the content of a bucket that has images but wants to present low-resolution versions of the images (e.g. thumbnails). while the original uploaded of the images is a different unrelated client app
* images may be watermarked based on which client fetches them
* etc.
it looks like most of the infrastructure is already there, and it should be straight forward to add a "lua filter" (deriving from RGWGetObj_Filter) to the list of filters.

asymmetric "put" usecases should probably be handled out-of-band by using bucket notifications, where some external entity, reads the object, does the processing and writes the modified object just like any other client.

the more challenging ones would be symmetric usecases (like encryption and compression), where the user might want to use different compression or encryption mechanism than the ones we are providing in our C++ code. for that, we would probably need to handle the "put" path as well inline.

not sure if this is directly related to zipper as the RGWGetDataCB seems to be agnostic of the backend storage?



On Wed, Oct 27, 2021 at 8:08 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
hi Yuval, i hope you don't mind a short history lesson here:

in 2016, Adam Kupczyk wrote https://github.com/ceph/ceph/pull/11494
for inline object compression, which added the initial streaming
abstractions RGWPutObjDataProcessor, RGWPutObj_Filter, and
RGWGetObj_Filter. compression was implemented with the
RGWPutObj_Compress and RGWGetObj_Decompress filters in
src/rgw/rgw_compression.h

in 2017, Adam added https://github.com/ceph/ceph/pull/11049 to support
server-side encryption with RGWGetObj_BlockDecrypt and
RGWPutObj_BlockEncrypt filters in src/rgw/rgw_crypt.h

in 2018, i cleaned up the PutObj side in
https://github.com/ceph/ceph/pull/24453, and distilled
RGWPutObjDataProcessor down to a single pure virtual in class
rgw::putobj::DataProcessor in src/rgw/rgw_putobj.h

last year, Dan finalized the zipper write interfaces in
https://github.com/ceph/ceph/pull/42550, which moved this
DataProcessor into namespace rgw::sal, and used it as a basis for the
rgw::sal::Writer interface in src/rgw/rgw_sal.h

each zipper store implements this virtual function
rgw::sal::Writer::process_data(), which sees every buffer segment that
streams in from the client. each store has the ability to add extra
filters, using something like rgw::putobj::Pipe to compose them into a
pipeline

for object reads, there's a separate class RGWGetDataCB with a similar
virtual function handle_data(), which still provides the basis for the
RGWGetObj_Filter used by compression and encryption. this is passed
into zipper through rgw::sal::Object::ReadOp::iterate(), and each
store should be able to wrap that with other filters

the interaction between zipper and inline compression/encryption
(which currently happen above zipper in rgw_op.cc) will need some more
thought. because if zipper stores modify an encrypted or compressed
stream, the filters above zipper won't be able to successfully
decrypt/decompress it

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux