On Mon, Apr 16, 2018 at 6:26 AM, Xinying Song <songxinying.ftd@xxxxxxxxx> wrote: > Hi, guys > > I have some questions about the implementation of rgw multisite. Hope > anyone can give me some enlightenment. > > 1. RGWHTTPManager only has one thread to process libcurl read/write > callbacks, will a multi-thread version be added in the future? We take advantage of the fact that it is single threaded, for example by avoiding locking and other stuff. That been said my original thought was to either have a multithreaded manager, but to make sure coroutines stack and any stack that they spawn are run on the same thread, or to have multiple managers, each deal with different workloads. The latter is much easier. > 2. Data sync between rgw zones only use atomic put, will a multipart > put version be considered in the future? Possibly. Few problems with multipart put: as you said, it's not atomic, so need to make sure all parts correspond to the same object instance (that is: that the source object hasn't been replaced since we started the operation). Also need to make sure we deal with cleanups of aborted uploads, and continue from where we stopped previously. > 3. Cloud sync has a multipart put implementation, but no breakpoint > resume, will this feature be added in the future? It does have a breakpoint resume. We keep the upload state in a separate object and continue from where we stopped last time (also deal with cleanup, and source validation). Did you test it and it didn't work? > 4. Why not use spawn method instead of call method in > RGWAWSStreamObjToCloudMultipartCR::operate() when do multipart put in > the for loop for concurrency? > We can definitely improve on that. A single call is easier to manager, concurrently spawned stacks need to be throttled, cleaned up in case of failure, make a more complicated breakpoint state, etc. > For question 2, I have a preliminary idea: > When zone1 want sync data in zone2, firstly, zone1 send a rest api > request to zone2, maybe a new api or old api with extra rgwx header. > Then, zone2 listen on that api and do the multipart put to zone1 in > separate threads, maybe a manager thing like RGWDataChangesLog. > Finally, zone1 can confirm it has finished sync for a specified object > after receiving a multipart-complete request with an extra header from > zone2. > How about this idea? Is that right? > You described a push process where zone2 pushes data to zone1. The sync mechanism is pull based; zone1 needs to pull data from zone2. What we can do is instead of zone1 doing a single GET request from zone2 it can send concurrent GET requests to zone2 with different object ranges. Internally it will use the multipart object processor so that the built object will be created using multiple parts. It will also need to keep its status in a journal, and cleanup data if object was replaced while fetching it. Thanks, Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html