Re: RGW Federated Gateways and Apache 2.4 problems

Yehuda Sadeh <yehuda@xxxxxxxxxx> · Fri, 24 Oct 2014 08:58:56 -0700

On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote:
> I'm having a problem getting RadosGW replication to work after upgrading to
> Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster to
> Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and
> Ubuntu's packages cause the same problem.
>
> I'm pretty sure I'm missing something obvious, but I'm not seeing it.
>
> Has anybody else upgraded their federated gateways to apache 2.4?
>
>
>
> My setup
> 2 VMs, each running their own ceph cluster with replication=1
> test0-ceph.cdlocal is the primary zone, named us-west
> test1-ceph.cdlocal is the secondary zone, named us-central
> Before I start, replication works, and I'm running
>
> Ubuntu 14.04 LTS
> Emperor (0.72.2-1precise, retained using apt-hold)
> Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)
>
>
> As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets
> permission errors.  radosgw-agent.log:
> 2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync
> object bucket3/test6.jpg: state is error
>
> The access logs from the primary say (using vhost_combined log format):
> test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] "PUT
> /test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700] "GET
> /?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"
> <snip>
> test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET
> /admin/log?marker=00000000089.89.3&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2&max-entries=1000
> HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET
> /bucket3/test6.jpg?rgwx-uid=us-central&rgwx-region=us&rgwx-prepend-metadata=us
> HTTP/1.1" 403 249 "-" "-"
>
> 172.16.205.143 is the primary cluster, .144 is the secondary cluster, and .1
> is my workstation.
>
>
> The access logs on the secondary show:
> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET
> /admin/replica_log?bounds&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2
> HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "PUT
> /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3&rgwx-source-zone=us-west&rgwx-client-id=radosgw-agent
> HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET
> /admin/opstate?client-id=radosgw-agent&object=bucket3%2Ftest6.jpg&op-id=test1-ceph0.cdlocal%3A6484%3A3
> HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
>
> If I crank up radosgw debugging, it tells me that the calculated digest is
> correct for the /admin/* requests, but fails for the object GET:
> /admin/log
> 2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated
> digest=6Tt13P6naWJEc0mJmYyDj6NzBS8=
> 2014-10-23 15:44:29.257690 7fa6fcfb9700 15
> auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8=
> /bucket3/test6.jpg
> 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
> digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
> 2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0
> 2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request
> <snip>
> /bucket3/test6.jpg
> 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
> digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
> 2014-10-23 15:44:29.411573 7fa6fc7b8700 15
> auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0=
> 2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41
> 2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request
>
> That explains the 403 responses.
>
> So I have metadata replication working, but the data replication is failing
> with permission problems.  I verified that I can create users and buckets in
> the primary, and have them replicate to the secondary.
>
>
> A similar situation was posted to the list before.  That time, the problem
> was that the system users weren't correctly deployed to both the primary and
> secondary clusters.  I verified that both users exist in both clusters, with
> the same access and secret.
>
> Just to test, I used s3cmd.  I can read and write to both clusters using
> both system user's credentials.
>
>
> Anybody have any ideas?
>

You're hitting issue #9206. Apache 2.4 filters out certain http
headers because they use underscores instead of dashes. There's a fix
for that for firefly, although it hasn't made it to an officially
released version.

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com