Hi All, I have a cluster setup with 16 OSDs on 4 nodes, standard RGW install with standard rgw pools, replication on those pools is set to 2 (size 2, min_size 1). We've had the situation before where one node totally dropped out (so 4 OSDs) and the cluster health was warning and rgw as well as other pools were working fine. I now had a problem where we added a test pool with replication 1 (size 1, min_size 1), the node died again and 4 OSDs dropped out resulting in health_error and RGW not responding at all which I'm not sure why that would be the case. I understand that with a pool that uses size 1 and one OSD dropping out (unrecoverable), you'll loose all that data (pretty much), and it was only set to do some benchmarking, however, I didn't know that it was affecting the entire cluster. Restarting the rados-gw service would work, however, it wouldn't listen to requests as well as showing errors like this in the logs: 2016-11-18 11:13:47.231827 7f0aaadb2a00 10 cannot find current period zonegroup using local zonegroup 2016-11-18 11:13:47.231860 7f0aaadb2a00 20 get_system_obj_state: rctx=0x7fffb14242c0 obj=.rgw.root:default.realm state=0x564c3fa99858 s->prefetch_data=0 2016-11-18 11:13:47.232754 7f0aaadb2a00 10 could not read realm id: (2) No such file or directory 2016-11-18 11:13:47.232772 7f0aaadb2a00 10 Creating default zonegroup 2016-11-18 11:13:47.233376 7f0aaadb2a00 10 couldn't find old data placement pools config, setting up new ones for the zone ... 2016-11-18 11:13:47.251629 7f0aaadb2a00 10 ERROR: name default already in use for obj id 712c74f9-baf4-4d74-956b-022c67e4a5bb 2016-11-18 11:13:47.251631 7f0aaadb2a00 10 create_default() returned -EEXIST, we raced with another zonegroup creation Full log here: http://pastebin.com/iYpiF9wP Once we removed the pool with size = 1 via 'rados rmpool', the cluster started recovering and RGW served requests! Any ideas? Cheers, Thomas |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com