Hi, As an extension of earlier work done by Yehuda [1], I've gotten the great majority of the work done to support static website hosting in RGW, just like AmazonS3 [2]. I need to do some cleanups of the code prior to major review for submission, and solve one thorny problem first, have a few discussions about best courses of action, and then I'll be submitting this for more reviews before merging. ceph [3] s3-tests, unit tests [4] s3-tests, fuzzer tests [5] The thorny problem: ------------------- One of the pieces of functionality in S3Website is the ability to serve any public object in the bucket as the content on a custom error page (think shiny 404 error). In some cases, like trivial 403/404 errors, we can determine this quite early, before we fetch the object, and redirect the request to the error object instead (provided that we also redo the ACL check on the error object). In more complex cases (eg 416 Range Unsatisfiable, 412 Precondition Failed), it happens very late in the RGW request processing, and the req_state struct seems to have been mangled/pre-filled with a lot of decisions that aren't solvable. Either I have to repeat a lot of code for it, which I'm not happy about, or I have to refactor RGWGetObj* to more safely made the second GET request for the error object (and make sure range headers etc are NOT used for the get of the error object). I'm leaning to the latter. Oh, and for added fun, if an error object is configured, but is missing or private, you get a similar but different than without any error object configured, and sometimes the error codes are in the headers, but not always. Discussion pieces: ------------------ RGWRegion - presently has both "endpoints" and "hostnames", but doesn't make clear which APIs (S3, Swift, S3Website) might be available at each; or allow combinations to dedicate a specific FQDN to a given API. I'd like to replace both structures with a map structure [6] Bucket existence privacy: - In general I agree with the goal that we should be closely compatible with AmazonS3, but with an eye to security, I'd like to consider a specific deviation: - In AmazonS3, you can enumerate buckets for existence, simply looking for 404 NoSuchBucket vs 403 AccessDenied. I'd like to offer a configuration option that returns 403 Forbidden or 401 Unauthorized on anonymous requests to non-existent buckets. - Testing some of functionality against AmazonS3 has been somewhat painful, as AmazonS3 only provides eventual consistency of the website configuration (with the highest time I've seen so far being about 30 seconds). New configuration options/changes: ---------------------------------- rgw_enable_apis: gains 's3website' mode rgw_dns_s3website_name: similar to rgw_dns_name, but for s3website endpoint RGWRegion having per-rgw-api hostnames Patch series breakdown plans: ----------------------------- Here's the breakdown of patch series I'm considering for the changes (net 2kLOC in ceph, 1kLOC in testcases). [TODO marks pieces not in these sets of commits yet, but will be soon). ceph - split Formatter.cc - JSON/XML/Table formatter are separator now - add header & footer support for formatters - add knowledge of status - add HTML formatter - Add optional error handler hooks to RGWOp and RGWHandler for abort_early - Add optional retarget handler hooks - Add more flexible redirect handling - S3website code - x-amz-website-redirect-location handling (TODO: needs a bit more polish and testing) - TODO: Add more input validations to match S3, on stuff that's NOT documented but was discovered when I applied weirder testcases to AmazonS3: - 'Hostname' field has non-trivial validation (maybe borrow the outcome of wip-bucket_name_restrictions) - The 'Protocol' field for a redirect must be http/https, cannot be gopher or anything else. - The HttpRedirectCode field must contain one of: 301-305, 307, 308 The docs don't say this, and the error message says 'Any 3XX value except 300'. - First-match in RoutingRules wins; watch out with rules that match 4XX error codes. - Documentation - TODO: esp the parts missing from the S3 docs above s3-tests, unit tests - refactor for more requests - add new utiliities - add website tests s3-tests, fuzzer tests [5] Links for all the bits above ---------------------------- [1] https://github.com/ceph/ceph/tree/wip-static-website [2] http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html [3] https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master [4] https://github.com/ceph/s3-tests/compare/master...robbat2:wip-static-website [5] https://github.com/ceph/s3-tests/compare/master...robbat2:wip-website-fuzzy [6] https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master#diff-ee7891a35944697538486c9269e0d65bR909 -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html