500 Internal Server Error when aborting large multipart upload through object storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello list,

When we upload a large multipart upload to RGW and it fails, we want
to abort the upload. On large multipart uploads, with say 1000+ parts,
it will consistently return 500 errors when trying to abort the
upload. If you persist and ignore the 500s it will eventually abort
the upload.

For example, I've uploaded a 4GB test file using Python boto in 2MB
chunks but fail before it's complete. Then, trying to abort this via
s3cmd:

[dane@host ~]% s3cmd abortmp s3://mptest-dane/testfile.bin
2/eOAoUOh0H4bUY5HVi7ff9WD8VZQPk9o
ERROR: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
 support@xxxxxxxxxxxxx and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

WARNING: Retrying failed request:
/testfile.bin?uploadId=2/eOAoUOh0H4bUY5HVi7ff9WD8VZQPk9o
WARNING: 500 (Internal Server Error):
WARNING: Waiting 3 sec...
ERROR: syntax error: line 1, column 49
ERROR: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
 support@xxxxxxxxxxxxx and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

WARNING: Retrying failed request:
/testfile.bin?uploadId=2/eOAoUOh0H4bUY5HVi7ff9WD8VZQPk9o
WARNING: 500 (Internal Server Error):
WARNING: Waiting 6 sec...
ERROR: S3 error: 404 (NoSuchKey):

At this point the multipart upload no longer exists in the list, so
I'm assuming it's been deleted successfully.

(One of my colleagues is getting the same error when trying to delete
via boto, so I don't think this is something related to s3cmd
specifically).

I can't see anything in the logs relating to these 500s at all,
neither in the RADOSGW logs (which are next to useless anyway on my
system for some reason), nor in the Apache logs themselves.

ceph.conf:

[global]
fsid = [redacted]
mon_initial_members = ceph-mon-a, ceph-mon-b, ceph-mon-c
mon_host = 172.28.196.11,172.28.196.12,172.28.196.13
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
debug_paxos = 0
public_network = 172.28.196.0/25
cluster_network = 172.28.196.128/25
osd_crush_update_on_start = false

rgw enable ops log = true
rgw ops log rados = true

[client.radosgw.gateway]
    host = ceph-2-radosgw-a
    keyring = /etc/ceph/keyring.radosgw.gateway
    rgw_socket_path = /var/run/radosgw.sock
    log_file = /var/log/ceph/radosgw.log
    auto_start = true
    rgw dns name = [redacted]

    rgw enable usage log = true
    rgw usage log tick interval = 30
    rgw usage log flush threshold = 1024
    rgw usage max shards = 32
    rgw usage max user shards = 1

    rgw resolve cname = true

    rgw keystone url = https://[redacted]:5000
    rgw keystone admin token = [redacted]
    rgw keystone accepted roles = admin,member,_member_,Member
    rgw keystone token cache size = 1024
    rgw keystone revocation interval = 8400
    rgw s3 auth use keystone = true
    nss db path = /var/ceph/nss

    key = [redacted]
    caps mon = "allow rw"
    caps osd = "allow rwx"

root@ceph-2-radosgw-a:~# cat /etc/apache2/sites-enabled/rgw.conf
FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/radosgw.sock

<VirtualHost *:80>
  ServerName removed
  ServerAlias removed
  ServerAdmin support@xxxxxxxxxxxxx
  DocumentRoot /var/www

  RewriteEngine On
  RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
/s3gw.fcgi?page=$1&params=$2&%{QUERY_STRING}
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

  <IfModule mod_fastcgi.c>
    <Directory /var/www>
      Options +ExecCGI
      AllowOverride All
      SetHandler fastcgi-script
      Order allow,deny
      Allow from all
      AuthBasicAuthoritative Off
    </Directory>
  </IfModule>

  AllowEncodedSlashes On
  ErrorLog /var/log/apache2/rgw.error
  CustomLog /var/log/apache2/rgw.access combined
  ServerSignature Off
</VirtualHost>

root@ceph-2-radosgw-a:~# ceph --version
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

Thanks
Dane
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux