Some time ago I sent a description of the RGW metadata search feature that was implemented prior to Kraken. The feature itself was functional, but there were quite a few open questions and we didn't regard it as complete. Here's a reformatted version of that email: http://ceph.com/geen-categorie/rgw-metadata-search/ The gist of it is as follows. As part of the RGW multisite system, we introduced a way to create new tier types. Originally a single RGW zonegroup would include multiple zones, each mirroring each other. With the new sync modules system we can now make it so that a copy of the data can be sent to a different data tier. Enters Elasticsearch, that can be used to index the metadata of objects in a zonegroup. So now we can have multiple zones in a single zonegroup where one (or more) of the zones indexes the objects' metadata, instead of storing the data in rados. For example, we can create a zonegroup that would have 3 zones: zone A, zone B, and zone M. Zone A, and Zone B are data zones. Users will create buckets there, and upload objects to them. Zone M will be a metadata search zone. The data that will be written in A, and B will be indexed and users could query information about it when accessing zone M. One of the main question I had at the time was whether we should involve RGW with the search queries, or should that be left for the users to deal with, accessing Elasticsearch directly. We came to the conclusion that it would be much better in terms of user experience if we served as a proxy between the users and Elasticsearch and manage the queries ourselves. This allows us to provide a better experience, and while at it also solves the authentication and authorization problems. End users do not have access to Elasticsearch, and we make sure that the queries that are sent to Elasticsearch request for data that the users are permitted to read. I've been working on implementing the new RGW capabilities that allows it to be used for querying Elasticsearch. There are other changes that were added, which I will describe as well. The code is still pending review and testing, and can be found here: https://github.com/ceph/ceph/pull/14351 - What's new and how to configure? Follows is a list of the few new APIs. and new configurables. Configuration example below. 1. New RESTful APIs were added to RGW, in order to use and control metadata search: * Query metadata The request needs to be sent to RGW that is located on the elasticsearch tier zone. Input: GET /[<bucket>]?query=<expression> request params: - max-keys: max number of entries to return - marker: pagination marker expression := [(]<arg> <op> <value> [)][<and|or> ...] op is one of the following: <, <=, ==, >=, > For example: GET /?query=name==foo Will return all the indexed keys that user has read permission to, and are named 'foo'. The output will be a list of keys in XML that is similar to the S3 list buckets response. * Configure custom metadata fields Define which custom metadata entries should be indexed (under the specified bucket), and what are the types of these keys. If explicit custom metadata indexing is configured, this is needed so that rgw will index the specified custom metadata values. Otherwise it is needed in cases where the indexed metadata keys are of a type other than string. Note: Currently this request should be sent to the metadata master zone. Input: PUT /<bucket>?mdsearch HTTP headers: A-Amz-Meta-Search: <key [; type]> [, ...] Where key is x-amz-meta-<name>, and type is one of the following: string, integer, date. * Delete custom metadata configuration Delete custom metadata bucket configuration. Note: Currently this request should be sent to the metadata master zone. Input: DELETE /<bucket>?mdsearch * Get custom metadata configuration Retrieve custom metadata bucket configuration. Input: GET /<bucket>?mdsearch 2. Elasticsearch tier zone configurables The following configurables are now defined: * endpoint Specifies the Elasticsearch server endpoint to access * num_shards (integer) The number of shards that Elasticsearch will be configured with on data sync initialization. Note that this cannot be changed after init. Any change here requires rebuild of the Elasticsearch index and reinit of the data sync process. * num_replicas (integer) The number of the replicas that Elasticsearch will be configured with on data sync initialization. * explicit_custom_meta (true | false) Specifies whether all user custom metadata will be indexed, or whether user will need to configure (at the bucket level) what custome metadata entries should be indexed. This is false by default * index_buckets_list (comma separated list of strings) If empty, all buckets will be indexed. Otherwise, only buckets specified here will be indexed. It is possible to provide bucket prefixes (e.g., foo*), or bucket suffixes (e.g., *bar). * approved_owners_list (comma separated list of strings) If empty, buckets of all owners will be indexed (subject to other restrictions), otherwise, only buckets owned by specified owners will be indexed. Suffixes and prefixes can also be provided. * override_index_path (string) if not empty, this string will be used as the elasticsearch index path. Otherwise the index path will be determined and generated on sync initialization. 3. Configuration example (the following instructions are based on the multi-site configuration document) We'll have a simple configuration in which we'd create a new realm, with a single zonegroup, and have two zones in that zonegroup: a data zone, and a metadata search zone. Both zones will run on the same ceph cluster. * Naming realm: gold zonegroup: us data zone: us-east-1 metadata search zone: us-east-es * Prerequisites - ceph cluster - elasticsearch configured, we'll assume it runs on the same machine as radosgw, listening to default port 9200 * System Keys Similar to a regular multisite configuration, we'll need to define system keys for cross radosgw communications: $ SYSTEM_ACCESS_KEY=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 | head -n 1) $ SYSTEM_SECRET_KEY=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 40 | head -n 1) $ RGW_HOST=<host> * Create a realm $ radosgw-admin realm create --rgw-realm=gold --default * Remove default zone (not necessarily needed, only if default zone was generated) radosgw-admin zonegroup delete --rgw-zonegroup=default * Create zonegroup $ radosgw-admin zonegroup create --rgw-zonegroup=us --endpoints=http://${RGW_HOST}:8000 --master --default { "id": "db23c836-9184-4090-a6dc-8bb0489c72ba", "name": "us", "api_name": "us", "is_master": "true", "endpoints": [ "http:\/\/<RGW_HOST>:8000" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "", "zones": [], "placement_targets": [], "default_placement": "", "realm_id": "0fea4ced-14fb-436d-8a4d-3d362adcf4e1" } * Create zone $ radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-1 --endpoints=http://${RGW_HOST}:8000 --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY --default --master { "id": "a9b9e45a-4fa6-49e8-9236-db31e84169b8", "name": "us-east-1", "domain_root": "us-east-1.rgw.meta:root", "control_pool": "us-east-1.rgw.control", "gc_pool": "us-east-1.rgw.log:gc", "lc_pool": "us-east-1.rgw.log:lc", "log_pool": "us-east-1.rgw.log", "intent_log_pool": "us-east-1.rgw.log:intent", "usage_log_pool": "us-east-1.rgw.log:usage", "user_keys_pool": "us-east-1.rgw.meta:users.keys", "user_email_pool": "us-east-1.rgw.meta:users.email", "user_swift_pool": "us-east-1.rgw.meta:users.swift", "user_uid_pool": "us-east-1.rgw.meta:users.uid", "system_key": { "access_key": "NgKnw4Q9ocFUJUykxHiu", "secret_key": "QahZhmhRg12oiKOq1bVsO6qO43Yqd8OMu8jrwVSq" }, "placement_pools": [ { "key": "default-placement", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "" } } ], "metadata_heap": "", "tier_config": [], "realm_id": "0fea4ced-14fb-436d-8a4d-3d362adcf4e1" } * Create system user * radosgw-admin user create --uid=zone.user --display-name="Zone User" --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY --system { "user_id": "zone.user", "display_name": "Zone User", "email": "", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "zone.user", "access_key": "NgKnw4Q9ocFUJUykxHiu", "secret_key": "QahZhmhRg12oiKOq1bVsO6qO43Yqd8OMu8jrwVSq" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "system": "true", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw" } * Update the period $ radosgw-admin period update --commit { "id": "96535dc9-cb15-4c3d-96a1-d661a2f6e71f", "epoch": 1, "predecessor_uuid": "691ebbf4-7104-4c78-aa42-7d20061e31ff", "sync_status": [ ... "realm_id": "0fea4ced-14fb-436d-8a4d-3d362adcf4e1", "realm_name": "gold", "realm_epoch": 2 } * Start radosgw <this step varies, depending on the specific OS and env> One way to do it: $ radosgw --rgw-frontends="civetweb port=8000" --log-file=/var/log/ceph/radosgw-us-east-1.log * Configure second zone in the same cluster, used for metadata indexing $ radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-es --access-key=$SYSTEM_ACCESS_KEY --secret=$SYSTEM_SECRET_KEY --endpoints=http://${RGW_HOST}:8002 { "id": "24b0a61c-8a99-4f30-9bce-a99900dba818", "name": "us-east-es", "domain_root": "us-east-es.rgw.meta:root", "control_pool": "us-east-es.rgw.control", "gc_pool": "us-east-es.rgw.log:gc", "lc_pool": "us-east-es.rgw.log:lc", "log_pool": "us-east-es.rgw.log", "intent_log_pool": "us-east-es.rgw.log:intent", "usage_log_pool": "us-east-es.rgw.log:usage", "user_keys_pool": "us-east-es.rgw.meta:users.keys", "user_email_pool": "us-east-es.rgw.meta:users.email", "user_swift_pool": "us-east-es.rgw.meta:users.swift", "user_uid_pool": "us-east-es.rgw.meta:users.uid", "system_key": { "access_key": "NgKnw4Q9ocFUJUykxHiu", "secret_key": "QahZhmhRg12oiKOq1bVsO6qO43Yqd8OMu8jrwVSq" }, "placement_pools": [ { "key": "default-placement", "val": { "index_pool": "us-east-es.rgw.buckets.index", "data_pool": "us-east-es.rgw.buckets.data", "data_extra_pool": "us-east-es.rgw.buckets.non-ec", "index_type": 0, "compression": "" } } ], "metadata_heap": "", "tier_config": [], "realm_id": "0fea4ced-14fb-436d-8a4d-3d362adcf4e1" } * Elasticsearch related zone configuration $ radosgw-admin zone modify --rgw-zone=us-east-es --tier-type=elasticsearch --tier-config=endpoint=http://localhost:9200,num_shards=10,num_replicas=1 { "id": "24b0a61c-8a99-4f30-9bce-a99900dba818", "name": "us-east-es" ... "tier_config": [ { "key": "endpoint", "val": "http:\/\/localhost:9200" }, { "key": "num_replicas", "val": "1" }, { "key": "num_shards", "val": "10" } ], "realm_id": "0fea4ced-14fb-436d-8a4d-3d362adcf4e1" } * Update period $ radosgw-admin period update --commit ... * Start second radosgw <this step varies, as with the first radosgw> One way to do it: $ radosgw --rgw-zone=us-east-es --rgw-frontends="civetweb port=8002" --log-file=/var/log/ceph/radosgw-us-east-es.log * Create a user, upload stuff $ radosgw-admin user create --uid=yehsad --display-name=yehuda ... I'm using the obo tool (can be found here: https://github.com/yehudasa/obo) to create buckets and upload some data: $ export S3_ACCESS_KEY_ID=... $ export S3_SECRET_ACCESS_KEY=... $ export S3_HOSTNAME=$RGW_HOST:8000 $ obo create buck $ obo put buck/foo --in-file=foo $ obo put buck/foo1 --in-file=foo * Query metadata I implemented a metadata search operation in obo, and it can be used as follows: First, make sure we point obo at the correct radosgw: $ export S3_HOSTNAME=$RGW_HOST:8002 $ obo mdsearch buck --query='name>=foo1' { "SearchMetadataResponse": { "Marker": {}, "IsTruncated": "false", "Contents": [ { "Bucket": "buck", "Key": "foo2", "Instance": "null", "LastModified": "2017-04-06T23:18:39.053Z", "ETag": "\"7748956db0bddb51a2bb81a26395ff98\"", "Owner": { "ID": "yehsad", "DisplayName": "yehuda" }, "CustomMetadata": {} }, { "Bucket": "buck", "Key": "foo1", "Instance": "null", "LastModified": "2017-04-06T23:18:15.029Z", "ETag": "\"7748956db0bddb51a2bb81a26395ff98\"", "Owner": { "ID": "yehsad", "DisplayName": "yehuda" }, "CustomMetadata": {} } ] } } $ Configure custom metadata By default we don't index any custom metadata. We can turn on custom metadata indexing on a bucket by the following obo command: $ obo mdsearch buck --config='x-amz-meta-foo; string, x-amz-meta-bar; integer' Note that this will only apply to new data (indexing old data will require re-initializing the sync process on the specific bucket). $ Query metadata again Upload a few more objects, this time with custom metadata: $ obo put buck/foo3 --in-file=LICENSE --x-amz-meta foo=abc bar=12 $ obo put buck/foo4 --in-file=LICENSE --x-amz-meta foo=bbb bar=8 $ obo put buck/foo2 --in-file=LICENSE --x-amz-meta foo=aaa and we can run the following query: $ obo mdsearch buck --query='x-amz-meta-foo==aaa or x-amz-meta-bar < 12' { "SearchMetadataResponse": { "Marker": {}, "IsTruncated": "false", "Contents": [ { "Bucket": "buck", "Key": "foo4", "Instance": "null", "LastModified": "2017-04-07T00:04:15.584Z", "ETag": "\"7748956db0bddb51a2bb81a26395ff98\"", "Owner": { "ID": "yehsad", "DisplayName": "yehuda" }, "CustomMetadata": { "Entry": [ { "Name": "foo", "Value": "bbb" }, { "Name": "bar", "Value": "8" } ] } }, { "Bucket": "buck", "Key": "foo2", "Instance": "null", "LastModified": "2017-04-07T00:05:00.666Z", "ETag": "\"7748956db0bddb51a2bb81a26395ff98\"", "Owner": { "ID": "yehsad", "DisplayName": "yehuda" }, "CustomMetadata": { "Entry": { "Name": "foo", "Value": "aaa" } } } ] } } That's pretty much it. I'll probably edit this email and put it where it needs to be under ceph/doc. I identified a few issues when working on this document, and I'm sure there are many more. My next planned task is to create a testing tool for it. Please let me know if you have any questions or comments. We're planning to get this merged in for Luminous. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html