----- Original Message ----- > From: "Robin H. Johnson" <robbat2@xxxxxxxxxx> > To: ceph-devel@xxxxxxxxxxxxxxx > Cc: "Jonathan LaCour" <jonathan.lacour@xxxxxxxxxxxxx> > Sent: Tuesday, June 23, 2015 2:33:25 AM > Subject: RGW S3 Website hosting, non-clean code for early review > > Hi, > > As an extension of earlier work done by Yehuda [1], I've gotten the > great majority of the work done to support static website hosting in > RGW, just like AmazonS3 [2]. > > I need to do some cleanups of the code prior to major review for > submission, and solve one thorny problem first, have a few discussions > about best courses of action, and then I'll be submitting this for more > reviews before merging. > > ceph [3] > s3-tests, unit tests [4] > s3-tests, fuzzer tests [5] > > The thorny problem: > ------------------- > One of the pieces of functionality in S3Website is the ability to serve > any public object in the bucket as the content on a custom error page > (think shiny 404 error). In some cases, like trivial 403/404 errors, we > can determine this quite early, before we fetch the object, and redirect > the request to the error object instead (provided that we also redo the > ACL check on the error object). > > In more complex cases (eg 416 Range Unsatisfiable, 412 Precondition > Failed), it happens very late in the RGW request processing, and the > req_state struct seems to have been mangled/pre-filled with a lot of > decisions that aren't solvable. > > Either I have to repeat a lot of code for it, which I'm not happy about, > or I have to refactor RGWGetObj* to more safely made the second GET > request for the error object (and make sure range headers etc are NOT > used for the get of the error object). I'm leaning to the latter. Is generating a new req_state a possibility? E.g., you catch the error at the top level, and restart most of the request processing with a newly created req_state? > > Oh, and for added fun, if an error object is configured, but is missing > or private, you get a similar but different than without any error > object configured, and sometimes the error codes are in the headers, but > not always. > > Discussion pieces: > ------------------ > RGWRegion > - presently has both "endpoints" and "hostnames", but doesn't make clear > which APIs (S3, Swift, S3Website) might be available at each; or allow > combinations to dedicate a specific FQDN to a given API. > I'd like to replace both structures with a map structure [6] Makes sense. > Bucket existence privacy: > - In general I agree with the goal that we should be closely compatible > with AmazonS3, but with an eye to security, I'd like to consider a specific > deviation: > - In AmazonS3, you can enumerate buckets for existence, simply looking > for 404 NoSuchBucket vs 403 AccessDenied. I'd like to offer a > configuration option that returns 403 Forbidden or 401 Unauthorized on > anonymous requests to non-existent buckets. As long as it's configurable. > - Testing some of functionality against AmazonS3 has been somewhat > painful, as AmazonS3 only provides eventual consistency of the website > configuration (with the highest time I've seen so far being about 30 > seconds). Yup. > > New configuration options/changes: > ---------------------------------- > rgw_enable_apis: gains 's3website' mode > rgw_dns_s3website_name: similar to rgw_dns_name, but for s3website endpoint > RGWRegion having per-rgw-api hostnames > > Patch series breakdown plans: > ----------------------------- > Here's the breakdown of patch series I'm considering for the changes > (net 2kLOC in ceph, 1kLOC in testcases). > [TODO marks pieces not in these sets of commits yet, but will be soon). > > ceph > - split Formatter.cc > - JSON/XML/Table formatter are separator now > - add header & footer support for formatters > - add knowledge of status > - add HTML formatter > - Add optional error handler hooks to RGWOp and RGWHandler for abort_early > - Add optional retarget handler hooks > - Add more flexible redirect handling > - S3website code > - x-amz-website-redirect-location handling (TODO: needs a bit more polish and > testing) > - TODO: Add more input validations to match S3, on stuff that's NOT > documented but was discovered when I applied weirder testcases to > AmazonS3: > - 'Hostname' field has non-trivial validation (maybe borrow the > outcome of wip-bucket_name_restrictions) > - The 'Protocol' field for a redirect must be http/https, cannot be > gopher or anything else. > - The HttpRedirectCode field must contain one of: 301-305, 307, 308 > The docs don't say this, and the error message says 'Any 3XX value > except 300'. > - First-match in RoutingRules wins; watch out with rules that match > 4XX error codes. > - Documentation > - TODO: esp the parts missing from the S3 docs above > > s3-tests, unit tests > - refactor for more requests > - add new utiliities > - add website tests > s3-tests, fuzzer tests [5] > > Links for all the bits above > ---------------------------- > [1] https://github.com/ceph/ceph/tree/wip-static-website > [2] http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html > [3] > https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master > [4] > https://github.com/ceph/s3-tests/compare/master...robbat2:wip-static-website > [5] > https://github.com/ceph/s3-tests/compare/master...robbat2:wip-website-fuzzy > [6] > https://github.com/ceph/ceph/compare/master...robbat2:wip-static-website-robbat2-master#diff-ee7891a35944697538486c9269e0d65bR909 > Great! I'll wait for the cleaned up pull request. Yehuda > -- > Robin Hugh Johnson > Gentoo Linux: Developer, Infrastructure Lead > E-Mail : robbat2@xxxxxxxxxx > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html