RadosGW : troubleshoooting zone / zonegroup / period

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Here a resume of the trouble shooting story on my radosgw.

after some manipulations on the zone definition, we get stuck in a situation where we cannot update zones and zonegroups anymore.

This situation has affected bucket manipulation too :

> radosgw-admin bucket list
> 2016-09-06 09:04:14.810198 7fcbb01d5900  0 Error updating periodmap, multiple master zonegroups configured 
> 2016-09-06 09:04:14.810213 7fcbb01d5900  0 master zonegroup: 4d982760-7853-4174-8c05-cec2ef148cf0 and  default
> 2016-09-06 09:04:14.810215 7fcbb01d5900  0 ERROR: updating period map: (22) Invalid argument
> 2016-09-06 09:04:14.810230 7fcbb01d5900  0 failed to add zonegroup to current_period: (22) Invalid argument
> 2016-09-06 09:04:14.810238 7fcbb01d5900 -1 failed converting region to zonegroup : ret -22 (22) Invalid argument

After multiple discussions with 2 ceph developers from redhat, we found a bug in the period management in the RadosGW.

A bug has been submited : http://tracker.ceph.com/issues/17239

I've written the different step of the troubleshooting here :

1. Situation

firstly, we have created one zonegroup and one zone to allow us to put data on replicated pool or erasure pool through the RadosGW

default_zonegroup.json
default_zone.json

At This point, if we try to modify the zone or create a new one, we won't be able to commit those change and the radosgw will be in an unstable
state.

The problem come from period update

    http://docs.ceph.com/docs/jewel/radosgw/multisite/#update-the-period

In the period there is one zone set as master and this period is in conflict with the one we have updated then impossible to fix the situation.

2. Troubleshooting #1

After many try, the only solution we found was to start from scatch the definition of zone / zonegroup / period. To do that, we have do delete
the .rgw.root pool.

But before, we have to stop all radosgw daemon.

> rados purge .rgw.root --yes-i-really-really-mean-it

after deleting the pool, we start the radosgw daemon.

To be able to manipulate zone and zonegroup we must create a realm ID

> radosgw-admin realm create --rgw-realm=default --default

Then create a new zonegroup and a new zone and set them as default and commit

> radosgw-admin zonegroup create --rgw-zonegroup=default --master --default
> radosgw-admin zone create --rgw-zonegroup=default --rgw-zone=default --default --master
> radosgw-admin zonegroup default --rgw-zonegroup default
> radosgw-admin zone default --rgw-zone default
> radosgw-admin period update
> radosgw-admin period update --commit

But then we found that the zone id and the zonegroup id does not match the one we have in the bucket instance. The id of the zone appear in the
name of the bucket.instance and both id (zone and zonegroup) are present in the metadata of the bucket.instance

> radosgw-admin metadata list bucket.instance
> [
>     "newtest2:69a46a98-09a8-41f3-9122-ced11496513b.1095087.2",
>     "testreplicate:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1",
>     "testerasure:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34103.1",
>     "newtest:69a46a98-09a8-41f3-9122-ced11496513b.1095087.1"
> ]

> radosgw-admin metadata get bucket.instance:testreplicate:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1
> {
>     "key": "bucket.instance:testreplicate:c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1",
>     "ver": {
>         "tag": "_e5KzBElZ1V3P0ZEbQElSBRA",
>         "ver": 1
>     },
>     "mtime": "2016-08-23 13:39:10.662987Z",
>     "data": {
>         "bucket_info": {
>             "bucket": {
>                 "name": "testreplicate",
>                 "pool": "default.rgw.buckets.data",
>                 "data_extra_pool": "default.rgw.buckets.extra",
>                 "index_pool": "default.rgw.buckets.index",
>                 "marker": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1",
>                 "bucket_id": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e.34100.1"
>             },
>             "creation_time": "0.000000",
>             "owner": "replicate",
>             "flags": 0,
>             "zonegroup": "4d982760-7853-4174-8c05-cec2ef148cf0",
>             "placement_rule": "default-placement",
>             "has_instance_obj": "true",
>             "quota": {
>                 "enabled": false,
>                 "max_size_kb": -1,
>                 "max_objects": -1
>             },
>             "num_shards": 0,
>             "bi_shard_hash_type": 0,
>             "requester_pays": "false",
>             "has_website": "false",
>             "swift_versioning": "false",
>             "swift_ver_location": ""
>         },
>         "attrs": [
>             {
>                 "key": "user.rgw.acl",
>                 "val":
> "AgKRAAAAAwIaAAAACQAAAHJlcGxpY2F0ZQkAAAByZXBsaWNhdGUDA2sAAAABAQAAAAkAAAByZXBsaWNhdGUPAAAAAQAAAAkAAAByZXBsaWNhdGUEAzoAAAACAgQAAAAAAAAACQAAAHJlcGxpY2F0ZQAAAAAAAAAAAgIEAAAADwAAAAkAAAByZXBsaWNhdGUAAAAAAAAAAA=="
>             },
>             {
>                 "key": "user.rgw.idtag",
>                 "val": ""
>             },
>             {
>                 "key": "user.rgw.manifest",
>                 "val": ""
>             }
>         ]
>     }
> }

We try to fixe this by creating new zonegroup and zone with the good IDs, set as default and delete the other one but we fall back on the bug on
period update

3. Troubleshooting #2

Restart from scratch the process :

We stop all the radosgw daemon, delete the .rgw.root pool, start the radosgw, create the realm again

Then we decide to try to create the zonegroup and the zone from json we save with good IDs set

We have to be careful to change the realm id in the 2 json with the new one, if not it won't work.

After edition the 2 files again

default_zonegroup.json
default_zone.json

we can create the zonegroup and zone like that :

> radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json
> radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < default_zone.json

At this point, the new zonegroup and zone were successfully created but their IDs wasn't those in the json, during the set, the radosgw-admin
create a new IDs for both zonegroup and zone.

In this situation we are still not able to access to the data. We have to start again from scratch...

4. Troubleshooting #3

We decide to restart the process but leave the radosgw stopped, we have the intuition that may affect the behaviour by creation default zone and
zonegroup itself.

Finally we did that :

Stop all RadosGW !

Purge the .rgw.root pool

> rados purge .rgw.root --yes-i-really-really-mean-it

create a new realm id and set it as default

> radosgw-admin realm create --rgw-realm=default --default

Edit the 2 json files to change the realm id with the new one

> vim default_zone.json #change realm with the new one
> vim default_zonegroup.json #change realm with the new one

Create the zonegroup and the zone like that (the order is really important here !)

> radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json
> radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < default_zone.json

Set zonegroup and zone as default

> radosgw-admin zonegroup default --rgw-zonegroup default
> radosgw-admin zone default --rgw-zone default

We can check if the zone and the zonegroup are good by doing this

> radosgw-admin zonegroup list
> radosgw-admin zonegroup get
> radosgw-admin zone list
> radosgw-admin zone get

We have to update the period (do not commit first and read if the data in the update are good)

> radosgw-admin period update

Then we can commit the period update to apply the configuration

> radosgw-admin period update --commit

We can now safely restart the radosgw !

-- 
Yoann Moulin
EPFL IC-IT


Attachment: default_zone.json
Description: application/json

Attachment: default_zonegroup.json
Description: application/json

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux