Re: Federated gateways

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20:

2014-11-11 14:37:06.688331 7f54447f0700  1 ====== starting new request req=0x7f546800f3b0 =====
2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl header for bucket, generating default
2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 --> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe.
2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4
2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354
2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK
2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG
2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0
2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600
2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190
2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0
2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 _ondisk_ = 0) v6
2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 0x7f51b4001950 prio 127
2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).write_ack 48
2014-11-11 14:37:06.695421 7f54ebfff700  1 -- 172.16.10.103:0/1007381 <== osd.25 172.16.10.103:6934/14875 48 ==== osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 _ondisk_ = 0) v6 ==== 190+0+0 (4092879147 0 0) 0x7f51b4001950 con 0x7f53f00053f0
2014-11-11 14:37:06.695458 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.695476 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.695495 7f54ebfff700 10 -- 172.16.10.103:0/1007381 dispatch_throttle_release 190 to dispatch throttler 190/104857600
2014-11-11 14:37:06.695506 7f54ebfff700 20 -- 172.16.10.103:0/1007381 done calling dispatch on 0x7f51b4001950
2014-11-11 14:37:06.695616 7f54447f0700  0 > HTTP_DATE -> Tue Nov 11 14:37:06 2014
2014-11-11 14:37:06.695636 7f54447f0700  0 > HTTP_X_AMZ_COPY_SOURCE -> test/upload
2014-11-11 14:37:06.696823 7f54447f0700  1 -- 172.16.10.103:0/1007381 --> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800fbb0 con 0x7f53f00053f0
2014-11-11 14:37:06.696866 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe.
2014-11-11 14:37:06.696935 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.696972 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 49 features 17592186044415 0x7f534800fbb0 osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4
2014-11-11 14:37:06.697120 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 49): sig = 6092508395557517420
2014-11-11 14:37:06.697161 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 49 0x7f534800fbb0
2014-11-11 14:37:06.697223 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.697257 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.701315 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK
2014-11-11 14:37:06.701364 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 49
2014-11-11 14:37:06.701376 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.701389 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG
2014-11-11 14:37:06.701402 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0
2014-11-11 14:37:06.701415 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600
2014-11-11 14:37:06.701435 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190
2014-11-11 14:37:06.701449 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0
2014-11-11 14:37:06.701458 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
2014-11-11 14:37:06.701569 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 49 0x7f51b4001460 osd_op_reply(1784 statelog.obj_opstate.97 [call] v47531'14 uv14 _ondisk_ = 0) v6
2014-11-11 14:37:06.701597 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 0x7f51b4001460 prio 127
2014-11-11 14:37:06.701627 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.701636 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.701678 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).write_ack 49
2014-11-11 14:37:06.701684 7f54ebfff700  1 -- 172.16.10.103:0/1007381 <== osd.25 172.16.10.103:6934/14875 49 ==== osd_op_reply(1784 statelog.obj_opstate.97 [call] v47531'14 uv14 _ondisk_ = 0) v6 ==== 190+0+0 (1714651716 0 0) 0x7f51b4001460 con 0x7f53f00053f0
2014-11-11 14:37:06.701710 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.701728 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.701751 7f54ebfff700 10 -- 172.16.10.103:0/1007381 dispatch_throttle_release 190 to dispatch throttler 190/104857600
2014-11-11 14:37:06.701762 7f54ebfff700 20 -- 172.16.10.103:0/1007381 done calling dispatch on 0x7f51b4001460
2014-11-11 14:37:06.701815 7f54447f0700  0 WARNING: set_req_state_err err_no=5 resorting to 500
2014-11-11 14:37:06.701894 7f54447f0700  1 ====== req done req=0x7f546800f3b0 http_status=500 ======


Any information you could give me would be wonderful as I’ve been banging my head against this for a few days. 

Thanks, Aaron 

On Nov 5, 2014, at 3:02 PM, Aaron Bassett <aaron@xxxxxxxxxxxxxxxxx> wrote:

Ah so I need both users in both clusters? I think I missed that bit, let me see if that does the trick.

Aaron 
On Nov 5, 2014, at 2:59 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote:

One region two zones is the standard setup, so that should be fine.

Is metadata (users and buckets) being replicated, but not data (objects)? 


Let's go through a quick checklist:
  • Verify that you enabled log_meta and log_data in the region.json for the master zone
  • Verify that RadosGW is using your region map with radosgw-admin regionmap get --name client.radosgw.<name> 
  • Verifu 
  • Verify that RadosGW is using your zone map with radosgw-admin zone get --name client.radosgw.<name> 
  • Verify that all the pools in your zone exist (RadosGW only auto-creates the basic ones).
  • Verify that your system users exist in both zones with the same access and secret.
Hopefully that gives you an idea what's not working correctly.  

If it doesn't, crank up the logging on the radosgw daemon on both sides, and check the logs.  Add debug rgw = 20 to both ceph.conf (in the client.radosgw.<name> section), and restart.  Hopefully those logs will tell you what's wrong.


On Wed, Nov 5, 2014 at 11:39 AM, Aaron Bassett <aaron@xxxxxxxxxxxxxxxxx> wrote:
Hello everyone, 
I am attempted to setup a two cluster situation for object storage disaster recovery. I have two physically separate sites so using 1 big cluster isn’t an option. I’m attempting to follow the guide at: http://ceph.com/docs/v0.80.5/radosgw/federated-config/ . After a couple days of flailing, I’ve settled on using 1 region with two zones, where each cluster is a zone. I’m now attempting to set up an agent as per the “Multi-Site Data Replication section. The agent kicks off ok and starts making all sorts of connections, but no objects were being copied to the non-master zone. I re-ran the agent with the -v flag and saw a lot of:

DEBUG:urllib3.connectionpool:"GET /admin/opstate?client-id=radosgw-agent&object=test%2F_shadow_.JjVixjWmebQTrRed36FL6D0vy2gDVZ__39&op-id=phx-r1-head1%3A2451615%3A1 HTTP/1.1" 200 None                                                                                                         
DEBUG:radosgw_agent.worker:op state is []                                                                                                      
DEBUG:radosgw_agent.worker:error geting op state: list index out of range                                                                      

So it appears something is still wrong with my agent though I have no idea what. I can’t seem to find any errors in any other logs. Does anyone have any insight here? 

I’m also wondering if what I’m attempting with two cluster in the same region as separate zones makes sense?

Thanks, Aaron 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux