Re: How exactly does rgw work?

LOPEZ Jean-Charles <jelopez@xxxxxxxxxx> · Wed, 21 Dec 2016 08:22:59 -0800

Hi Gerald,
for the s3 and swift case, the clients are not accessing the ceph cluster. They are s3 and swift clients and only discuss with the RGW over HTTP. The RGW is the ceph client that does all the interaction with the ceph cluster.

Best
JC

On Dec 21, 2016, at 07:27, Gerald Spencer <ger.spencer3@xxxxxxxxx> wrote:

I was under the impression that when a client talks to the cluster, it grabs the osd map and computes the crush algorithm to determine where it stores the object. Does the rgw server do this for clients? If I had 12 clients all talking through one gateway, would that server have to pass all of the objects from the clients to the cluster?

And 48 osd nodes, each with 12 x 6TB drives and a PCIe write journal. That would be 576 osds in the cluster, with about 3.4PB raw...

On Tue, Dec 20, 2016 at 1:12 AM Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 20 december 2016 om 3:24 schreef Gerald Spencer <ger.spencer3@xxxxxxxxx>:

>

>

> Hello all,

>

> We're currently waiting on a delivery of equipment for a small 50TB proof

> of concept cluster, and I've been lurking/learning a ton from you. Thanks

> for how active everyone is.

>

> Question(s):

> How does the raids gateway work exactly?

The RGW doesn't do any RAID. It chunks up larger objects into smaller RADOS chunks. The first chunk is always 512k (IIRC) and then it chunks up into 4MB RADOS objects.

> Does it introduce a single point of failure?

It does if you deploy only one RGW. Always deploy multiple with loadbalancing in front.

> Does all of the traffic go through the host running the rgw server?

Yes it does.

>

> I just don't fully understand that side of things. As for architecture our

> poc will have:

> - 1 monitor

> - 4 OSDs with 12 x 6TB drives, 1 x 800 PCIe journal

>

Underscaled machines, go for less disks per machine but more machines. More smaller machines works a lot better with Ceph then a few big machines.

> I'd all goes as planned, this will scale up to:

> - 3 monitors

Always run with 3 MONs. Otherwise it is a serious SPOF.

> - 48 osds

>

> This should give us enough storage (~1.2PB) wth enough throughput to handle

> the data requirements of our machines to saturate our 100Gb link...

>

That won't happen with just 4 machines. Replica 3x taken into account is well. You will need a lot more machines to get the 100Gb link fully utilized.

Wido

>

>

>

>

> Cheers,

> G

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com