Hi,
I'm working on a RGW setup where I'm using Varnish[0] to cache objects,
but when doing so you run into the problem that a lot of (cached)
requests will not reach the RGW itself so the accounting of traffic
isn't correct.
To overcome this I've been sending all the logs from Varnish to
Logstash[1] and into ElasticSearch and afterwards analyzing the logs in
ElasticSearch to find out how much traffic each bucket did.
This method works, but it isn't safe enough. Since I'm currently parsing
the "Host" header to find out which bucket it was, but this isn't always
safe since users can CNAME.
So I've been playing with the idea to add the "Rgwx-bucket" header to
each response which tells you which bucket the request was made to.
In Varnish I can catch this response header and send it to Logstash so I
have a safer method of which requests was done by which bucket.
I'm using Varnish, but you could do the same with nginx or any HTTP
caching proxy.
Would it be an idea to add this to RGW? I have it running on my system
and it works fine, but it's currently a bit hacky.
A config variable like "rgw expose bucket" could be false by default,
but when set to true RGW would send the response header with the bucket
name.
How does this sound?
P.S.: When this is all up and running I'm planning to make a cool
presentation about this for the next Ceph day.
[0]: http://www.varnish-cache.org/
[1]: http://www.logstash.net/
--
Wido den Hollander
42on B.V.
Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html