Adding another radosgw node

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Fri, 26 Sep 2014 15:22:26 -0700

Comments inline.

On Mon, Sep 22, 2014 at 9:38 AM, Jon K?re Hellan <jon.kare.hellan at uninett.no
> wrote:

> Hi
>
> We've got a three node ceph cluster, and radosgw on a fourth machine. We
> would like to add another radosgw machine for high availability. Here are a
> few questions I have:
>
> - We aren't expecting to deploy to multiple regions and zones anywhere
>   soon. So presumably, we do not have to worry about federated
>   deployment. Would it be hard to move to a federated deployment later?
>

No... but it might be less work overall if you deploy the first zone now.

By default, Ceph creates 3 .rgw.* pools to use.  Once you start loading
data in those pools, you're going to want to keep them.

In a standard federation setup, you create several extra pools.  By
convention, the pools are named after the region, zone, and function.  For
example, in my primary zone, I have:

   - .us.rgw.root
   - .us-west-1.rgw.root
   - .us-west-1.rgw.control
   - .rgw.buckets.index
   - .rgw.buckets
   - .us-west-1.rgw.gc
   - .us-west-1.log
   - .us-west-1.intent-log
   - .us-west-1.usage
   - .us-west-1.users
   - .us-west-1.users.email
   - .us-west-1.users.swift
   - .us-west-1.users.uid
   - .rgw.root
   - .rgw

It is possible to overlay federation on top of your existing pools.  It
involves more thinking on your part.  It also means that zone-1 will be
setup differently than all the other zones.

There are also some gotchas on loading up the primary with a lot of data,
then enabling replication.  Replication uses the metadata and data logs.
If you upload objects with those logs disabled, then replication won't ever
replicate the data.  If you're going to want to replicate that data at some
point, it's easier if you setup the zone with logging enabled now.

Then there's the logistics of copying terabytes of data between two
clusters.  We'll leave that for later. :-)

> - What is a radosgw instance? I was guessing that it was a machine
>   running radosgw. If not, is it a separate gateway with a separate
>   set of user and pools, possibly running on the same machine?
>

It's a radosgw daemon, plus a web server.  The web server could be a
separate apache instance, or it could be the built-in civetweb server.

It can be running on a dedicated machine, or it can run on existing nodes.
I currently have a 5 node cluster, and I'm running radosgw+apache on all 5
nodes.  I plan to move to dedicated radosgw nodes, probably when I switch
to dedicated monitor nodes.

> - Can I simply deploy another radosgw machine with the same
>   configuration as the first one? If the second interpretation is true,
>   I guess I could.
>

You can, with a similar configuration.  The Radosgw + Apache setup involves
using a client name, which is unique per radosgw node.  Other than changing
the names in ceph.conf and the FastCGI wrapper, the nodes are identical.
Chef takes care of this for me.

> - Am I right that all gateway users go in the same keyring, which is
>   copied to all the gateway nodes and all the monitor nodes?
>

I'm not sure I understand, possibly because I have CephX disabled.  RadosGW
users are separate and distinct from ceph nodes and authentication.
RadosGW users are created using the radosgw-admin command (or Admin REST
API).  RadosGW users have no access to your cluster outside of HTTP.

- The gateway nodes obviously need a
>   [client.radosgw.{instance-name}] stanza in /etc/ceph.conf. Do the
>   monitor nodes also need a copy of the stanza?
>

Nope, that's just for RadosGW.  A machine only needs the pieces of the
ceph.conf that the daemons running on that node need.  Nodes that aren't
monitors don't need [mon].

- Do the gateway nodes need all of the monitors' [global] stanza in
>   their /etc/ceph.conf? Presumably, they at least need mon_host to know
>   who to talk to. What else?
>

You probably want to keep the [global] sections identical across all
nodes.  It's not strictly required, but I think it would lead to confusion
and human errors.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140926/4aa06c82/attachment.htm>