Re: RadosGW fault tolerance

Rustam Aliyev <rustam.lists@xxxxxxx> · Mon, 25 Mar 2013 23:16:39 +0000

Hi Yehuda,

Thanks for reply, my comments below inline.

On 25/03/2013 04:32, Yehuda Sadeh wrote:
On Sun, Mar 24, 2013 at 7:14 PM, Rustam Aliyev <rustam.lists@xxxxxxx> wrote:
Hi,

I was testing RadosGW setup and observed strange behavior - RGW becomes
unresponsive or won't start whenever cluster health is degraded (e.g.
restarting one of the OSDs). Probably I'm doing something wrong but I
couldn't find any information about this.

I'm running 0.56.3 on 3 node cluster (3xMON, 3xOSD). I increased replication
factor for rgw related pools so that cluster can survive single node failure
(quorum).

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 256
pgp_num 256 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 256
pgp_num 256 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 256
pgp_num 256 last_change 1 owner 0
pool 3 'pbench' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 150
pgp_num 150 last_change 11 owner 0
pool 4 '.rgw' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 90
pgp_num 8 last_change 111 owner 0
pool 5 '.rgw.gc' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 8
pgp_num 8 last_change 112 owner 0
pool 6 '.rgw.control' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num
8 pgp_num 8 last_change 114 owner 0
pool 7 '.users.uid' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 8
pgp_num 8 last_change 117 owner 0
pool 8 '.users.email' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num
8 pgp_num 8 last_change 118 owner 0
pool 9 '.users' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 8
pgp_num 8 last_change 115 owner 0
pool 11 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins
pg_num 1024 pgp_num 1024 last_change 108 owner 0

Any idea how to fix this?

We'll need some more specific info with regard to the actual scenario
in order to determine what exactly is that you're seeing What is the
exact scenario you're testing (osd goes down?).
I'm just doing "service ceph stop osd" on one of the nodes
However, there are a
few things to note:
  - you have only 3 osds, which means that a single osd going down
affects large portion of your data. How and what exactly happens
really depends on your configuration.
Configuration is quite simple, 3 osd and 3 monitos with default params: 
http://pastebin.com/LP3X7cf9
Note that it is not highly
impossible that it takes some time to determine that an osd went down.
I tested that scenario - it seems that you are right. It basically takes 
some time, but I'm not sure if that's expected. So when I shut down osd, 
rgw becomes unresponsive for 2 minutes. Then it works even though health 
is degraded. After some time I brought back osd (started) and rgw became 
unresponsive again - this time however for 5 minutes. Then it started 
functioning again while pgs were recovering in the background.
If it is expected that this osd gets 1/3 of the traffic, which means
that until there's a map change, the gateway will still try to contact
it.
Does it mean that rgw/rados waits for all replicas to acknowledge 
success? Is it possible to configure it in a way where quorum is enough 
- i.e. 2 out of 3 replicas written successfully and rgw returns OK?
  - some of your pools contain a very small number of pgs (8). Probably
not related to your issue, but you'd want to change that.
Yes, I'm aware of that - just kept their default values for now.

Yehuda

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com