RGW Federated Gateways and Apache 2.4 problems

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Thu, 23 Oct 2014 15:51:24 -0700

I'm having a problem getting RadosGW replication to work after upgrading to Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster to Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and Ubuntu's packages cause the same problem.
I'm pretty sure I'm missing something obvious, but I'm not seeing it.

Has anybody else upgraded their federated gateways to apache 2.4?

My setup
2 VMs, each running their own ceph cluster with replication=1
test0-ceph.cdlocal is the primary zone, named us-west
test1-ceph.cdlocal is the secondary zone, named us-central
Before I start, replication works, and I'm running 
Ubuntu 14.04 LTS
Emperor (0.72.2-1precise, retained using apt-hold)
Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)

As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets permission errors.  radosgw-agent.log:
2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync object bucket3/test6.jpg: state is error

The access logs from the primary say (using vhost_combined log format):
test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] "PUT /test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700] "GET /?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"
<snip>
test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET /admin/log?marker=00000000089.89.3&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2&max-entries=1000 HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET /bucket3/test6.jpg?rgwx-uid=us-central&rgwx-region=us&rgwx-prepend-metadata=us HTTP/1.1" 403 249 "-" "-"

172.16.205.143 is the primary cluster, .144 is the secondary cluster, and .1 is my workstation.

The access logs on the secondary show:
test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET /admin/replica_log?bounds&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2 HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "PUT /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3&rgwx-source-zone=us-west&rgwx-client-id=radosgw-agent HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET /admin/opstate?client-id=radosgw-agent&object=bucket3%2Ftest6.jpg&op-id=test1-ceph0.cdlocal%3A6484%3A3 HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

If I crank up radosgw debugging, it tells me that the calculated digest is correct for the /admin/* requests, but fails for the object GET:
/admin/log
2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated digest=6Tt13P6naWJEc0mJmYyDj6NzBS8=
2014-10-23 15:44:29.257690 7fa6fcfb9700 15 auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8=
/bucket3/test6.jpg
2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0
2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request
<snip>
/bucket3/test6.jpg
2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
2014-10-23 15:44:29.411573 7fa6fc7b8700 15 auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0=
2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41
2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request

That explains the 403 responses.

So I have metadata replication working, but the data replication is failing with permission problems.  I verified that I can create users and buckets in the primary, and have them replicate to the secondary.

A similar situation was posted to the list before.  That time, the problem was that the system users weren't correctly deployed to both the primary and secondary clusters.  I verified that both users exist in both clusters, with the same access and secret.

Just to test, I used s3cmd.  I can read and write to both clusters using both system user's credentials.

Anybody have any ideas?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com