Re: Are there any statistics available on how most production ceph clusters are being used?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 21, 2019 at 03:11:44PM +0200, Marc Roos wrote:
> Double thanks for the on-topic reply. The other two repsonses, were 
> making me doubt if my chinese (which I didn't study) is better than my english.
They were almost on topic, but not that useful. Please don't imply
language failings on this list. English may be the lingua franca, but it
is by far not the first language for most list members. Not being useful
to you didn't mean they weren't useful overall.

>  >> I am a bit curious on how production ceph clusters are being used. I 
> am 
>  >> reading here that the block storage is used a lot with openstack and 
> 
>  >> proxmox, and via iscsi with vmare. 
>  >Have you looked at the Ceph User Surveys/Census?
>  >https://ceph.com/ceph-blog/ceph-user-survey-2018-results/
>  >https://ceph.com/geen-categorie/results-from-the-ceph-census/
> 
> Sort of what I was looking for, so 42% use rgw, of which 74% s3.
> I guess this main archive usage, is mostly done by providers
Not just archive, but also API-driven for web services, usually hidden
behind hostnames/CDNs. Image/video upload sites are a big part of this,
esp. things like Instagram clones in emerging markets.

>  >As the quantity of data by a single user increases, the odds that GUI
>  >tools are used for it decreases, as it's MUCH more likely to be driven
>  >by automation & tooling around the API.
> Hmm, interesting. I am having more soho clients. And was thinking of
> getting them such gui client.
That's great, but orthogonal to the overall issue. Some of the cloud
providers DO offer setup docs for GUI clients as well, off the top of my
head I know Dreamhost & DigitalOcean's ones, because I contributed to
their docs:
https://help.dreamhost.com/hc/en-us/sections/115000059232-DreamObjects-clients
https://www.digitalocean.com/docs/spaces/resources/

> I think if you take the perspective of some end user that associates s3,
> with something like an audi and nothing else. It is quite necessary 
> to have a client that is easy and secure to use, where you just enter
>  preferably only two things, your access key and your secret.
There's a bare minimum of three things you'd need in a generic client:
- endpoint(s)
- access key
- secret

The Endpoint could be partially pre-provisioned (think like you'd give
your clients an INI file that pointed them to your private Ceph RGW
deployment). If it's a deployment with multiple regions, endpoints &
region-specifics become more important (e.g. AWS S3 has differing
signature requirements in different regions)

> The advantage of having a more rgw specific gui client, is that you
> - do not have the default amazon 'advertisements' (think of storage 
> classes etc.)
> - less configuration options, everything ceph does not support we do not
>   need to configure. 
> - no ftp, no what ever else, just this s3
> - you do not have configuration options that ceph doesn't offer 
>   (eg. this life cycle, bucket access logging?)
- Storage Classes: supported
- Bucket Lifecycle: supported
- Bucket Access Logging: not quite supported, PR exists, some debate
  about better designs. https://github.com/ceph/ceph/pull/14841

>   I can imagine if you have quite a few clients, you could get quite 
> some questions to answer, about things not working.
> - you have better support for specific things like multi tenant account, 
> etc.
Tenacy in RGW if effectively parallel S3 scopes; with different
endpoints.

> - for once the https urls are correctly advertised
What issue do you have with HTTPS URLs? The main gotcha that most people
hit is that S3's ssl hostname validation rule is NOT the same as the
general SSL hostname validation rule, and trips up browser access.
Specifically in a wildcard SSL cert, '*.myrgwendpoint.com', the general
rule is that '*' should only match one DNS fragment [e.g. no '.'], while
S3's validation says it can match one or more DNS fragments.
The AWS S3 docs are even horrible about this, with the text:
"To work around this, use HTTP or write your own certificate
verification logic."
https://github.com/awsdocs/amazon-s3-developer-guide/blame/f498926b68f4f1b11c7f708ac0fbd52ee2a0aa19/doc_source/BucketRestrictions.md#L35

> Whether one likes it or not ceph is afaik not fully s3 compatible
No, Ceph isn't fully AWS-S3 compatible, and I did specifically include in my
talk at Cephalocon last year that we should explicitly be returning 501
NotImplemented in more cases. AWS-S3 in itself is a moving target, and
some of the operations ARE best offloaded to something other than Ceph.

Even if Ceph/RGW does support a given set of operations, does the
deployment want to consider those operations supported? This thinking
lead to the torrent ops being behind a configuration option in Ceph, and
other ops can be & are blocked by providers in the reverse proxy.

There ARE RGW-specific features that would be valuable to have in more
clients:
- RGW Admin operations [the list of them is much longer than the docs
  suggest]
- RGW Metadata search
- RGW layout op (similar but better than 'radosgw-admin object stat',
  has some data that's NOT in 'object stat')

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@xxxxxxxxxx
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux