Re: rgw-multisite: add multipart sync for rgw zones

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 10, 2018 at 9:13 AM, Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> Hi, all:
>
> We made a pr for rgw zone sync using multipart upload referring to cloud
> sync.
> Link: https://github.com/ceph/ceph/pull/21925

Great work!

> Here is the brief idea:
>
> Why multipart?
>    1. breakpoint resume.
>    2. concurrency for better performance.
>
> What changed?
>    Actually, with the option rgw_sync_multipart_threshold=0, rgw will behave
> as before. If this option is set, for example, to be 32MB, objects larger
> than this will be synced in multipart way.
>
> Implements:
>    The entrance for this feature is at
> RGWDefaultDataSyncModule::sync_object(), where a new coroutine called
> RGWDefaultHandleRemoteObjCR handles the sync logic, which is similar to
> RGWAWSHandleRemoteObjCR. This coroutine will decide whether a multipart
> sync or atomic sync will be used. Atomic sync will call RGWFetchRemoteObjCR
> which works the same way as before.
>
>    For multipart sync, a coroutine called RGWFetchRemoteObjMultipartCR will
> be used. This coroutine executes in 5 steps:
>    1. compare mtime/zone_id/pg_ver between src and dest obj.
>    2. init a upload_id or load it from breakpoint status info obj.
>    3. do part upload( fetch remote by range and then write to rgw)
> concurrently.
>    4. complete the multipart upload.
>    5. remove breakpoint status info obj.
>
>    Some of the coroutines mentioned above are implemented in
> rgw/rgw_sync_module_default.h/cc files, which are similar to
> rgw/rgw_sync_module_aws.h/cc files.
>
>    Codes in step 2 to 4 are implemented in rgw/rgw_cr_rados.h/cc and
> rgw/rgw_rados.h/cc files. Each step has its own coroutine( abbr. for CR
> later), CR will send an async op to rados' async thread pool and the async
> op will call the new added RGWRados::xxx methods to do necessary work. Call
> stacks like this:
>    RGWInitMultipartCR --> RGWAsyncInitMultipart -->
> RGWRados::init_multipart()
>    RGWFetchRemoteObjMultipartPartCR --> RGWAsyncFetchRemoteObjMultipartPart
> --> RGWRados::fetch_remote_obj_multipart_part()
>    RGWCompleteMultipartCR --> RGWAsyncCompleteMultipart -->
> RGWRados::complete_multipart()
>
>    Unlike atomic sync, whose 'PUT' operation is executed in RGWHTTPManager's
> single thread context, multipart sync's 'PUT' operation for each part is
> executed in individual threads in the async thread pool. By this way,
> multiple parts can be uploaded concurrently and alleviate the workload of
> RGWHTTPManager. To achieve this target, a new ReciveCB callback class
> called RGWRadosPutObjMultipartPart is introduced. This new cb clas will
> copy the data received by RGWHTTPManager to another area, then write these
> data to disk in a synchronized way, and all these work is finished within
> the thread in thread pool, so RGWHTTPManager only acts as a stream pipe,
> same as cloud sync.
>
> PS:
> 1. The new add put processor works in the same way as
> RGWPutObjProcessor_Multipart except it doesn't require a req_state to
> initialize.

I don't like the idea of having two separate implementations here.
Maybe code need to be refactored somehow to avoid duplication?

> 2. Function rgw_http_req_data::finish() is modified, adding
> client->signal() call. This modification targets to wake up
> RGWRadosPutObjMultipartPart's wait. Because there isn't a wake-up mechanism
> due to lack of a coroutine context.

I'm not sure why you need it. RGWFetchRemoteObjMultipartPartCR should
wake up when RGWFetchRemoteObjMultipartPart::_send_request() exists,
like any of the other rados CRs. I'd rather not add this callback, I'm
not sure it's not adding lock dependency issues, and/or complicates
the rgw_http_req_data/RGWHTTPClient relationship.

> 3. Add RGWSimpleRadosRemoveCR which will remove the breakpoint status obj
> from BOTH disk and cache. The along used RGWRadosRemoveCR do not clear
> cache, and RGWSimpleReadCR will get cache first, so this pair CRs can not
> cooperate well. Note that there are a few places use this pair CRs, maybe
> we should fix that.
>
Hmm, the same issue probably happens in the cloud sync module.
However, I don't really think caching this data is something that we
should be doing (we'll just be spamming the watch/notify channel). I
certainly didn't intend to go through cache. There should be the set
of raw system obj functions that should be used. Another option is to
avoid caching certain pools, but that's a bigger change.

> Any advice will be appreciated. Thanks.

I'll go over the PR and comment there. One thing that I think you
missed is that etags need to be preserved, and when using multipart
uploads we'll generate new etags. The multipart processor will need to
be extended with the ability to force a specific etag.

Cheers,
Yehuda

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux