Re: Expose rgw using consul or service discovery

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Sat, 23 Oct 2021 01:47:50 +0200

In PetaSAN we use Consul to provide a service mesh for running 
services active/active over Ceph.

For rgw, we use nginx to load balance rgw gateways, the nginx 
themselves run in an active/active ha setup so they do not become a 
bottleneck as you pointed out with the haproxy setup.

How do you manage rgw upgrade ? do you use cephadm or any other 
automation tool ?

How is nginx configured to talk to rgw ? using a upstream an a proxy 
pass ?

PetaSAN is a Ceph storage appliance based on Ubuntu OS and SUSE kernel. 
We rely on Consul service mesh to scale the service/gateways layer in a 
scale-out active/active fashion, this is for iSCSI, NFS, SMB and S3.
Upgrades are done live via apt upgrade We do not use cephadm, we provide 
a web based deployment ui (wizard like steps) as well as ui for cluster 
management.
For nginx, we use the upstream method to configure the load balancing of 
the rgws. The nginx config file is dynamically created/updated by a 
python script which receives notifications from Consul (nodes 
added/nodes down/ip changes..).
You can read more on our website
http://www.petasan.org <http://www.petasan.org>

/Maged

On 22/10/2021 16:41, Pierre GINDRAUD wrote:

On 20/10/2021 10:17, Sebastian Wagner wrote:
Am 20.10.21 um 09:12 schrieb Pierre GINDRAUD:
Hello,

I'm migrating from puppet to cephadm to deploy a ceph cluster, and 
I'm
using consul to expose radosgateway. Before, with puppet, we were
deploying radosgateway with "apt install radosgw" and applying 
upgrade
using "apt upgrade radosgw". In our consul service a simple 
healthcheck
on this url worked fine "/swift/healthcheck", because we were able to
put consul agent in maintenance mode before operations.
I've seen this thread
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/32JZAIU45KDTOWEW6LKRGJGXOFCTJKSS/#N7EGVSDHMMIXHCTPEYBA4CYJBWLD3LLP 

that proves consul is a possible way.

So, with cephadm, the upgrade process decide by himself when to stop,
upgrade and start each radosgw instances.
Right

It's an issue because the
consul healthcheck must detect "as fast as possible" the instance 
break
to minimize the number of applicatives hits that can use the down
instance's IP.

In some application like traefik
https://doc.traefik.io/traefik/reference/static-configuration/cli/ we
have an option "requestacceptgracetimeout" that allow the "http 
server"
to handle requests some time after a stop signal has been received 
while
the healthcheck endpoint immediatly started to response with an 
"error".
This allow the loadbalancer (consul here) to put instance down and 
stop
traffic to it before it fall effectively down.

In https://docs.ceph.com/en/latest/radosgw/config-ref/ I have see any
option like that. And in cephadm I haven't seen "pre-task" and "post
task" to, for exemple, touch a file somewhere consul will be able to
test it, or putting down a host in maintenance.

How do you expose radosgw service over your application ?
cephadm nowadays ships an ingress services using haproxy for this 
use case:

https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw 

Thanks for the link. I've analysed the high-availability pattern but 
I 've found the following cons about ceph proposal :
* the current active haproxy node can be considered as a bottleneck 
because it handle all TCP connections. In addition it add a 
significant overhead because require 2 TCP connections in total to 
talk to rgw
* the keepalived failover mecanism "break" TCP connection at the 
moment of the failover
* Is the cephadm module "drain" properly a node before to interact 
(stop/restart...) on it ? because if not, haproxy do not bring 
anything better than my consul service setup.

I'm thinking that haproxy+keepalive is a bit of complexity, a 
service discovery oriented approach is more simple and provide a 
"zero downtime" during all type of "planned maintenance"
( 
https://www.consul.io/use-cases/service-discovery-and-health-checking )

What do you think ?

Is someone already use this "high-availability-service-for-rgw" in a 
production environment ?

Have you any idea as workaround my issue ?
Plenty actually. cephadm itself does not provide a notification
mechanisms, but other component in the deployment stack might.

On the highest level we have the config-key store of the MONs. you
should be able to get notifications for config-key changes.
Unfortunately this would involve some Coding.

On the systemd level we have systemd-notify. I haven't looked into it,
but maybe you can get events about the rgw unit deployed by cephadm.

On the container level we have "podman events" that prints state 
changes
of containers.

To me a script that calls podman events on one hand and pushes updates
to consul sounds like the most promising solution to me.

In case you get this setup working properly, I'd love to read a blog
post about it.

Regards
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx