On Fri, Feb 24, 2017 at 6:43 AM, KIMURA Osamu <kimura.osamu@xxxxxxxxxxxxxx> wrote: > Hi Orit, > > Thanks for your interest in this issue. > I have one more question. > > I assumed "endpoints" of a zonegroup would be used for synchronization > of metadata. But, to extent that I read current Jewel code, it may > be used only for redirection. > (-ERR_PERMANENT_REDIRECT || -ERR_WEBSITE_REDIRECT) > It seems metadata synchronization is sent to endpoint of master zone > in each zonegroup (probably it has not been equipped for secondary > zonegroup). > > Is it correct? Correct, metadata sync is syncrounus and only the meta master (master zone in the master zonegroup) handles it. > If so, we can set endpoints of each zonegroup as client accessible > URL (i.e., front of proxy). On the other hand, endpoints of each > zone point internal one. This could work. > But, I still prefer to use "hostnames" field for this purpose. Yes using zonegroup endpoint could be confusing to the users. On the other hand a new parameter can introduce backward compatibility issues. I will look into it. Regards, Orit > > > Regards, > KIMURA > > > On 2017/02/23 20:34, KIMURA Osamu wrote: >> >> Sorry to late. >> I opened several tracker issues... >> >> On 2017/02/15 16:53, Orit Wasserman wrote: >>> >>> On Wed, Feb 15, 2017 at 2:26 AM, KIMURA Osamu >>> <kimura.osamu@xxxxxxxxxxxxxx> wrote: >>>> >>>> Comments inline... >>>> >>>> >>>> On 2017/02/14 23:54, Orit Wasserman wrote: >>>>> >>>>> >>>>> On Mon, Feb 13, 2017 at 12:57 PM, KIMURA Osamu >>>>> <kimura.osamu@xxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> >>>>>> Hi Orit, >>>>>> >>>>>> I almost agree, with some exceptions... >>>>>> >>>>>> >>>>>> On 2017/02/13 18:42, Orit Wasserman wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 13, 2017 at 6:44 AM, KIMURA Osamu >>>>>>> <kimura.osamu@xxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Orit, >>>>>>>> >>>>>>>> Thanks for your comments. >>>>>>>> I believe I'm not confusing, but probably my thought may not be well >>>>>>>> described... >>>>>>>> >>>>>>> :) >>>>>>>> >>>>>>>> >>>>>>>> On 2017/02/12 19:07, Orit Wasserman wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Feb 10, 2017 at 10:21 AM, KIMURA Osamu >>>>>>>>> <kimura.osamu@xxxxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Cephers, >>>>>>>>>> >>>>>>>>>> I'm trying to configure RGWs with multiple zonegroups within >>>>>>>>>> single >>>>>>>>>> realm. >>>>>>>>>> The intention is that some buckets to be replicated and others to >>>>>>>>>> stay >>>>>>>>>> locally. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> If you are not replicating than you don't need to create any zone >>>>>>>>> configuration, >>>>>>>>> a default zonegroup and zone are created automatically >>>>>>>>> >>>>>>>>>> e.g.: >>>>>>>>>> realm: fj >>>>>>>>>> zonegroup east: zone tokyo (not replicated) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> no need if not replicated >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> zonegroup west: zone osaka (not replicated) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> same here >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> zonegroup jp: zone jp-east + jp-west (replicated) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The "east" and "west" zonegroups are just renamed from "default" >>>>>>>> as described in RHCS document [3]. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Why do you need two zonegroups (or 3)? >>>>>>> >>>>>>> At the moment multisitev2 replicated automatically all zones in the >>>>>>> realm except "default" zone. >>>>>>> The moment you add a new zone (could be part of another zonegroup) it >>>>>>> will be replicated to the other zones. >>>>>>> It seems you don't want or need this. >>>>>>> we are working on allowing more control on the replication but that >>>>>>> will be in the future. >>>>>>> >>>>>>>> We may not need to rename them, but at least api_name should be >>>>>>>> altered. >>>>>>> >>>>>>> >>>>>>> >>>>>>> You can change the api_name for the "default" zone. >>>>>>> >>>>>>>> In addition, I'm not sure what happens if 2 "default" >>>>>>>> zones/zonegroups >>>>>>>> co-exist in same realm. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Realm shares all the zones/zonegroups configuration, >>>>>>> it means it is the same zone/zonegroup. >>>>>>> For "default" it means not zone/zonegroup configured, we use it to >>>>>>> run >>>>>>> radosgw without any >>>>>>> zone/zonegroup specified in the configuration. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I didn't think "default" as exception of zonegroup. :-P >>>>>> Actually, I must specify api_name in default zonegroup setting. >>>>>> >>>>>> I interpret "default" zone/zonegroup is out of realm. Is it correct? >>>>>> I think it means namespace for bucket or user is not shared with >>>>>> "default". >>>>>> At present, I can't make decision to separate namespaces, but it may >>>>>> be >>>>>> best choice with current code. >> >> >> Unfortunately, if "api_name" is changed for "default" zonegroup, >> the "default" zonegroup is set as a member of the realm. >> See [19040-1] >> >> It means no major difference from my first provided configuration. >> (except reduction of messy error messages [15776] ) >> >> In addition, the "api_name" can't be changed with "radosgw-admin >> zonegroup set" command if no realm has been defined. >> There is no convenient way to change "api_name". >> >> [19040-1]: http://tracker.ceph.com/issues/19040#note-1 >> [15776]: http://tracker.ceph.com/issues/15776 >> >>>>>>>>>> To evaluate such configuration, I tentatively built multiple >>>>>>>>>> zonegroups >>>>>>>>>> (east, west) on a ceph cluster. I barely succeed to configure it, >>>>>>>>>> but >>>>>>>>>> some concerns exist. >>>>>>>>>> >>>>>>>>> I think you just need one zonegroup with two zones the other are >>>>>>>>> not >>>>>>>>> needed >>>>>>>>> Also each gateway can handle only a single zone (rgw_zone >>>>>>>>> configuration parameter) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This is just a tentative one to confirm the behavior of multiple >>>>>>>> zonegroups >>>>>>>> due to limitation of our current equipment. >>>>>>>> The "east" zonegroup was renamed from "default", and another "west" >>>>>>>> zonegroup >>>>>>>> was created. Of course I specified both rgw_zonegroup and rgw_zone >>>>>>>> parameters >>>>>>>> for each RGW instance. (see -FYI- section bellow) >>>>>>>> >>>>>>> Can I suggest starting with a more simple setup: >>>>>>> Two zonegroups, the first will have two zones and the second will >>>>>>> have one zone. >>>>>>> It is simper to configure and in case of problems to debug. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I would try with such configuration IF time permitted. >> >> >> I tried. But it doesn't seem simpler :P >> Because it consists 3 zonegroups and 4 zones. >> I want to keep default zone/zonegroup. >> The target system already has huge amount of objects. >> >> >>>>>>>>>> a) User accounts are not synced among zonegroups >> >> >> I opened 2 issues [19040] [19041] >> >> [19040]: http://tracker.ceph.com/issues/19040 >> [19041]: http://tracker.ceph.com/issues/19041 >> >> >>>>>>>>>> I'm not sure if this is a issue, but the blueprint [1] stated a >>>>>>>>>> master >>>>>>>>>> zonegroup manages user accounts as metadata like buckets. >>>>>>>>>> >>>>>>>>> You have a lot of confusion with the zones and zonegroups. >>>>>>>>> A zonegroup is just a group of zones that are sharing the same data >>>>>>>>> (i.e. replication between them) >>>>>>>>> A zone represent a geographical location (i.e. one ceph cluster) >>>>>>>>> >>>>>>>>> We have a meta master zone (the master zone in the master >>>>>>>>> zonegroup), >>>>>>>>> this meta master is responible on >>>>>>>>> replicating users and byckets meta operations. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I know it. >>>>>>>> But the master zone in the master zonegroup manages bucket meta >>>>>>>> operations including buckets in other zonegroups. It means >>>>>>>> the master zone in the master zonegroup must have permission to >>>>>>>> handle buckets meta operations, i.e., must have same user accounts >>>>>>>> as other zonegroups. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Again zones not zonegroups, it needs to have an admin user with the >>>>>>> same credentials in all the other zones. >>>>>>> >>>>>>>> This is related to next issue b). If the master zone in the master >>>>>>>> zonegroup doesn't have user accounts for other zonegroups, all the >>>>>>>> buckets meta operations are rejected. >>>>>>>> >>>>>>> >>>>>>> Correct >>>>>>> >>>>>>>> In addition, it may be overexplanation though, user accounts are >>>>>>>> sync'ed to other zones within same zonegroup if the accounts are >>>>>>>> created on master zone of the zonegroup. On the other hand, >>>>>>>> I found today, user accounts are not sync'ed to master if the >>>>>>>> accounts are created on slave(?) zone in the zonegroup. It seems >>>>>>>> asymmetric behavior. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> This requires investigation, can you open a tracker issue and we >>>>>>> will >>>>>>> look into it. >>>>>>> >>>>>>>> I'm not sure if the same behavior is caused by Admin REST API >>>>>>>> instead >>>>>>>> of radosgw-admin. >>>>>>>> >>>>>>> >>>>>>> It doesn't matter both use almost the same code >>>>>>> >>>>>>>> >>>>>>>>>> b) Bucket creation is rejected if master zonegroup doesn't have >>>>>>>>>> the >>>>>>>>>> account >>>>>>>>>> >>>>>>>>>> e.g.: >>>>>>>>>> 1) Configure east zonegroup as master. >>>>>>>>> >>>>>>>>> >>>>>>>>> you need a master zoen >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2) Create a user "nishi" on west zonegroup (osaka zone) using >>>>>>>>>> radosgw-admin. >>>>>>>>>> 3) Try to create a bucket on west zonegroup by user nishi. >>>>>>>>>> -> ERROR: S3 error: 404 (NoSuchKey) >>>>>>>>>> 4) Create user nishi on east zonegroup with same key. >>>>>>>>>> 5) Succeed to create a bucket on west zonegroup by user nishi. >>>>>>>>> >>>>>>>>> >>>>>>>>> You are confusing zonegroup and zone here again ... >>>>>>>>> >>>>>>>>> you should notice that when you are using radosgw-admin command >>>>>>>>> without providing zonegorup and/or zone info (--rgw-zonegroup=<zg> >>>>>>>>> and >>>>>>>>> --rgw-zone=<zone>) it will use the default zonegroup and zone. >>>>>>>>> >>>>>>>>> User is stored per zone and you need to create an admin users in >>>>>>>>> both >>>>>>>>> zones >>>>>>>>> for more documentation see: >>>>>>>>> http://docs.ceph.com/docs/master/radosgw/multisite/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I always specify --rgw-zonegroup and --rgw-zone for radosgw-admin >>>>>>>> command. >>>>>>>> >>>>>>> That is great! >>>>>>> You can also onfigure default zone and zonegroup >>>>>>> >>>>>>>> The issue is that any buckets meta operations are rejected when the >>>>>>>> master >>>>>>>> zone in the master zonegroup doesn't have the user account of other >>>>>>>> zonegroups. >>>>>>>> >>>>>>> Correct >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I try to describe details again: >>>>>>>> 1) Create fj realm as default. >>>>>>>> 2) Rename default zonegroup/zone to east/tokyo and mark as default. >>>>>>>> 3) Create west/osaka zonegroup/zone. >>>>>>>> 4) Create system user sync-user on both tokyo and osaka zones with >>>>>>>> same >>>>>>>> key. >>>>>>>> 5) Start 2 RGW instances for tokyo and osaka zones. >>>>>>>> 6) Create azuma user account on tokyo zone in east zonegroup. >>>>>>>> 7) Create /bucket1 through tokyo zone endpoint with azuma account. >>>>>>>> -> No problem. >>>>>>>> 8) Create nishi user account on osaka zone in west zonegroup. >>>>>>>> 9) Try to create a bucket /bucket2 through osaka zone endpoint with >>>>>>>> azuma >>>>>>>> account. >>>>>>>> -> respond "ERROR: S3 error: 403 (InvalidAccessKeyId)" as >>>>>>>> expected. >>>>>>>> 10) Try to create a bucket /bucket3 through osaka zone endpoint with >>>>>>>> nishi >>>>>>>> account. >>>>>>>> -> respond "ERROR: S3 error: 404 (NoSuchKey)" >>>>>>>> Detailed log is shown in -FYI- section bellow. >>>>>>>> The RGW for osaka zone verify the signature and forward the >>>>>>>> request >>>>>>>> to tokyo zone endpoint (= the master zone in the master >>>>>>>> zonegroup). >>>>>>>> Then, the RGW for tokyo zone rejected the request by unauthorized >>>>>>>> access. >>>>>>>> >>>>>>> >>>>>>> This seems a bug, can you open a issue? >> >> >> I opened 2 issues [19042] [19043] >> >> [19042]: http://tracker.ceph.com/issues/19042 >> [19043]: http://tracker.ceph.com/issues/19043 >> >>>>>>>>>> c) How to restrict to place buckets on specific zonegroups? >>>>>>>>> >>>>>>>>> >>>>>>>>> you probably mean zone. >>>>>>>>> There is ongoing work to enable/disable sync per bucket >>>>>>>>> https://github.com/ceph/ceph/pull/10995 >>>>>>>>> with this you can create a bucket on a specific zone and it won't >>>>>>>>> be >>>>>>>>> replicated to another zone >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> My thought means zonegroup (not zone) as described above. >>>>>>> >>>>>>> >>>>>>> But it should be zone .. >>>>>>> Zone represent a geographical location , it represent a single ceph >>>>>>> cluster. >>>>>>> Bucket is created in a zone (a single ceph cluster) and it stored the >>>>>>> zone >>>>>>> id. >>>>>>> The zone represent in which ceph cluster the bucket was created. >>>>>>> >>>>>>> A zonegroup just a logical collection of zones, in many case you only >>>>>>> need a single zonegroup. >>>>>>> You should use zonegroups if you have lots of zones and it simplifies >>>>>>> your configuration. >>>>>>> You can move zones between zonegroups (it is not tested or supported >>>>>>> ...). >>>>>>> >>>>>>>> With current code, buckets are sync'ed to all zones within a >>>>>>>> zonegroup, >>>>>>>> no way to choose zone to place specific buckets. >>>>>>>> But this change may help to configure our original target. >>>>>>>> >>>>>>>> It seems we need more discussion about the change. >>>>>>>> I prefer default behavior is associated with user account (per SLA). >>>>>>>> And attribution of each bucket should be able to be changed via REST >>>>>>>> API depending on their permission, rather than radosgw-admin >>>>>>>> command. >>>>>>>> >>>>>>> >>>>>>> I think that will be very helpful , we need to understand what are >>>>>>> the >>>>>>> requirement and the usage. >>>>>>> Please comment on the PR or even open a feature request and we can >>>>>>> discuss it more in detail. >>>>>>> >>>>>>>> Anyway, I'll examine more details. >>>>>>>> >>>>>>>>>> If user accounts would synced future as the blueprint, all the >>>>>>>>>> zonegroups >>>>>>>>>> contain same account information. It means any user can create >>>>>>>>>> buckets >>>>>>>>>> on >>>>>>>>>> any zonegroups. If we want to permit to place buckets on a >>>>>>>>>> replicated >>>>>>>>>> zonegroup for specific users, how to configure? >>>>>>>>>> >>>>>>>>>> If user accounts will not synced as current behavior, we can >>>>>>>>>> restrict >>>>>>>>>> to place buckets on specific zonegroups. But I cannot find best >>>>>>>>>> way >>>>>>>>>> to >>>>>>>>>> configure the master zonegroup. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> d) Operations for other zonegroup are not redirected >>>>>>>>>> >>>>>>>>>> e.g.: >>>>>>>>>> 1) Create bucket4 on west zonegroup by nishi. >>>>>>>>>> 2) Try to access bucket4 from endpoint on east zonegroup. >>>>>>>>>> -> Respond "301 (Moved Permanently)", >>>>>>>>>> but no redirected Location header is returned. >>>>>>>>>> >>>>>>>>> >>>>>>>>> It could be a bug please open a tracker issue for that in >>>>>>>>> tracker.ceph.com for RGW component with all the configuration >>>>>>>>> information, >>>>>>>>> logs and the version of ceph and radosgw you are using. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I will open it, but it may be issued as "Feature" instead of "Bug" >>>>>>>> depending on following discussion. >> >> >> I opened an issue [19052] as "Feature" instead of "Bug". >> >> [19052]: http://tracker.ceph.com/issues/19052 >> >> I suggested to use "hostnames" field in zonegroup configuration >> for this purpose. I feel it is similar to s3 website feature. >> >>>>>>>>>> It seems current RGW doesn't follows S3 specification [2]. >>>>>>>>>> To implement this feature, probably we need to define another >>>>>>>>>> endpoint >>>>>>>>>> on each zonegroup for client accessible URL. RGW may placed behind >>>>>>>>>> proxy, >>>>>>>>>> thus the URL may be different from endpoint URLs for replication. >>>>>>>>>> >>>>>>>>> >>>>>>>>> The zone and zonegroup endpoints are not used directly by the user >>>>>>>>> with >>>>>>>>> a >>>>>>>>> proxy. >>>>>>>>> The user get a URL pointing to the proxy and the proxy will need to >>>>>>>>> be >>>>>>>>> configured to point the rgw urls/IPs , you can have several radosgw >>>>>>>>> running. >>>>>>>>> See more >>>>>>>>> >>>>>>>>> >>>>>>>>> https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/object-gateway-guide-for-red-hat-enterprise-linux/chapter-2-configuration >>>>>>>> >>>>>>>> >>>>>>>> Does it mean the proxy has responsibility to alter "Location" header >>>>>>>> as >>>>>>>> redirected URL? >>>>>>> >>>>>>> >>>>>>> No >>>>>>> >>>>>>>> Basically, RGW can respond only the endpoint described in zonegroup >>>>>>>> setting as redirected URL on Location header. But client may not >>>>>>>> access >>>>>>>> the endpoint. Someone must translate the Location header to client >>>>>>>> accessible URL. >>>>>>> >>>>>>> >>>>>>> Both locations will have a proxy. This means all communication is >>>>>>> done >>>>>>> through proxies. >>>>>>> The endpoint URL should be an external URL and the proxy on the new >>>>>>> location will translate it to the internal one. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Our assumption is: >>>>>> >>>>>> End-user client --- internet --- proxy ---+--- RGW site-A >>>>>> | >>>>>> | (dedicated line or VPN) >>>>>> | >>>>>> End-user client --- internet --- proxy ---+--- RGW site-B >>>>>> >>>>>> RGWs can't access through front of proxies. >>>>>> In this case, endpoints for replication are in backend network of >>>>>> proxies. >>>>> >>>>> >>>>> >>>>> do you have several radosgw instances in each site? >>>> >>>> >>>> >>>> Yes. Probably three or more instances per a site. >>>> Actual system will have same number of physical servers as RGW >>>> instances. >>>> We already tested with multiple endpoints per a zone within a zonegroup. >>> >>> >>> Good to hear :) >>> As for the redirect message in your case it's should to be handled by >>> the proxy and not by the client browser >>> as it cannot access the internal vpn network. The endpoints url should >>> be the url in the internal network. >> >> >> I don't agree. >> It requires more network bandwidth between sites. >> I think "hostnames" field provides client accessible URL >> that is front of proxy. It seems sufficient. >> >> >> In addition to above, I opened 2 issues [18800] [19053] regarding >> Swift API, that are not related this discussion. >> >> [18800]: http://tracker.ceph.com/issues/18800 >> [19053]: http://tracker.ceph.com/issues/19053 >> >> >> Regards, >> KIMURA >> >>> Orit >>>> >>>> >>>>>> How do you think? >>>>>> >>>>>>> Regards, >>>>>>> Orit >>>>>>> >>>>>>>> If the proxy translates Location header, it looks like >>>>>>>> man-in-the-middle >>>>>>>> attack. >>>>>>>> >>>>>>>> Regards, >>>>>>>> KIMURA >>>>>>>> >>>>>>>>> Regrads, >>>>>>>>> Orit >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any thoughts? >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> http://tracker.ceph.com/projects/ceph/wiki/Rgw_new_multisite_configuration >>>>>>>>>> [2] http://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html >>>>>>>> >>>>>>>> [3] >>>>>>>> https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/object-gateway-guide-for-red-hat-enterprise-linux/chapter-8-multi-site#migrating_a_single_site_system_to_multi_site >>>>>>>> >>>>>>>>>> ------ FYI ------ >>>>>>>>>> [environments] >>>>>>>>>> Ceph cluster: RHCS 2.0 >>>>>>>>>> RGW: RHEL 7.2 + RGW v10.2.5 >>>>>>>>>> >>>>>>>>>> zonegroup east: master >>>>>>>>>> zone tokyo >>>>>>>>>> endpoint http://node5:80 >>>>>>>> >>>>>>>> rgw frontends = "civetweb port=80" >>>>>>>> rgw zonegroup = east >>>>>>>> rgw zone = tokyo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> system user: sync-user >>>>>>>>>> user azuma (+ nishi) >>>>>>>>>> >>>>>>>>>> zonegroup west: (not master) >>>>>>>>>> zone osaka >>>>>>>>>> endpoint http://node5:8081 >>>>>>>> >>>>>>>> rgw frontends = "civetweb port=8081" >>>>>>>> rgw zonegroup = west >>>>>>>> rgw zone = osaka >>>>>>>> >>>>>>>>>> system user: sync-user (created with same key as zone tokyo) >>>>>>>>>> user nishi > > > -- > KIMURA Osamu / 木村 修 > Engineering Department, Storage Development Division, > Data Center Platform Business Unit, FUJITSU LIMITED > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html