Re: RGW Federated Gateways and Apache 2.4 problems

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Fri, 24 Oct 2014 10:15:50 -0700

Thanks!  I'll continue with Apache 2.2 until the next release.

On Fri, Oct 24, 2014 at 8:58 AM, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote:
On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote:

> I'm having a problem getting RadosGW replication to work after upgrading to

> Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster to

> Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and

> Ubuntu's packages cause the same problem.

>

> I'm pretty sure I'm missing something obvious, but I'm not seeing it.

>

> Has anybody else upgraded their federated gateways to apache 2.4?

>

>

>

> My setup

> 2 VMs, each running their own ceph cluster with replication=1

> test0-ceph.cdlocal is the primary zone, named us-west

> test1-ceph.cdlocal is the secondary zone, named us-central

> Before I start, replication works, and I'm running

>

> Ubuntu 14.04 LTS

> Emperor (0.72.2-1precise, retained using apt-hold)

> Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)

>

>

> As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets

> permission errors.  radosgw-agent.log:

> 2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync

> object bucket3/test6.jpg: state is error

>

> The access logs from the primary say (using vhost_combined log format):

> test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] "PUT

> /test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700] "GET

> /?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"

> <snip>

> test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET

> /admin/log?marker=00000000089.89.3&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2&max-entries=1000

> HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

> test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET

> /bucket3/test6.jpg?rgwx-uid=us-central&rgwx-region=us&rgwx-prepend-metadata=us

> HTTP/1.1" 403 249 "-" "-"

>

> 172.16.205.143 is the primary cluster, .144 is the secondary cluster, and .1

> is my workstation.

>

>

> The access logs on the secondary show:

> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET

> /admin/replica_log?bounds&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2

> HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "PUT

> /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3&rgwx-source-zone=us-west&rgwx-client-id=radosgw-agent

> HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET

> /admin/opstate?client-id=radosgw-agent&object=bucket3%2Ftest6.jpg&op-id=test1-ceph0.cdlocal%3A6484%3A3

> HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

>

> If I crank up radosgw debugging, it tells me that the calculated digest is

> correct for the /admin/* requests, but fails for the object GET:

> /admin/log

> 2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated

> digest=6Tt13P6naWJEc0mJmYyDj6NzBS8=

> 2014-10-23 15:44:29.257690 7fa6fcfb9700 15

> auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8=

> /bucket3/test6.jpg

> 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated

> digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=

> 2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0

> 2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request

> <snip>

> /bucket3/test6.jpg

> 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated

> digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=

> 2014-10-23 15:44:29.411573 7fa6fc7b8700 15

> auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0=

> 2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41

> 2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request

>

> That explains the 403 responses.

>

> So I have metadata replication working, but the data replication is failing

> with permission problems.  I verified that I can create users and buckets in

> the primary, and have them replicate to the secondary.

>

>

> A similar situation was posted to the list before.  That time, the problem

> was that the system users weren't correctly deployed to both the primary and

> secondary clusters.  I verified that both users exist in both clusters, with

> the same access and secret.

>

> Just to test, I used s3cmd.  I can read and write to both clusters using

> both system user's credentials.

>

>

> Anybody have any ideas?

>

You're hitting issue #9206. Apache 2.4 filters out certain http

headers because they use underscores instead of dashes. There's a fix

for that for firefly, although it hasn't made it to an officially

released version.

Yehuda

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com