On Fri, Oct 31, 2014 at 3:59 AM, Dane Elwell <dane.elwell@xxxxxxxxx> wrote: > Hello list, > > When we upload a large multipart upload to RGW and it fails, we want > to abort the upload. On large multipart uploads, with say 1000+ parts, > it will consistently return 500 errors when trying to abort the > upload. If you persist and ignore the 500s it will eventually abort > the upload. > > For example, I've uploaded a 4GB test file using Python boto in 2MB > chunks but fail before it's complete. Then, trying to abort this via > s3cmd: > > [dane@host ~]% s3cmd abortmp s3://mptest-dane/testfile.bin > 2/eOAoUOh0H4bUY5HVi7ff9WD8VZQPk9o > ERROR: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <html><head> > <title>500 Internal Server Error</title> > </head><body> > <h1>Internal Server Error</h1> > <p>The server encountered an internal error or > misconfiguration and was unable to complete > your request.</p> > <p>Please contact the server administrator, > support@xxxxxxxxxxxxx and inform them of the time the error occurred, > and anything you might have done that may have > caused the error.</p> > <p>More information about this error may be available > in the server error log.</p> > </body></html> > > WARNING: Retrying failed request: > /testfile.bin?uploadId=2/eOAoUOh0H4bUY5HVi7ff9WD8VZQPk9o > WARNING: 500 (Internal Server Error): > WARNING: Waiting 3 sec... > ERROR: syntax error: line 1, column 49 > ERROR: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <html><head> > <title>500 Internal Server Error</title> > </head><body> > <h1>Internal Server Error</h1> > <p>The server encountered an internal error or > misconfiguration and was unable to complete > your request.</p> > <p>Please contact the server administrator, > support@xxxxxxxxxxxxx and inform them of the time the error occurred, > and anything you might have done that may have > caused the error.</p> > <p>More information about this error may be available > in the server error log.</p> > </body></html> > > WARNING: Retrying failed request: > /testfile.bin?uploadId=2/eOAoUOh0H4bUY5HVi7ff9WD8VZQPk9o > WARNING: 500 (Internal Server Error): > WARNING: Waiting 6 sec... > ERROR: S3 error: 404 (NoSuchKey): > > At this point the multipart upload no longer exists in the list, so > I'm assuming it's been deleted successfully. > > (One of my colleagues is getting the same error when trying to delete > via boto, so I don't think this is something related to s3cmd > specifically). > > I can't see anything in the logs relating to these 500s at all, > neither in the RADOSGW logs (which are next to useless anyway on my > system for some reason), nor in the Apache logs themselves. This sounds like apache is timing out, as the operation takes too long for radosgw to complete. A multipart upload abort needs to iterate through all the parts, and at the moment it does each synchronously, so the gateway takes too long which triggers an apache timeout. Yehuda _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com