We heavily use radosgw here for most of our work and we have seen a
weird truncation issue with radosgw/s3 requests.
We have noticed that if the time between the initial "ticket" to grab
the object key and grabbing the data is greater than 90 seconds the
object returned is truncated to whatever RGW has grabbed/cached after
the initial connection and this seems to be around 512k.
Here is some PoC. This will work on most objects I have tested mostly 1G
to 5G keys in RGW::
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
#!/usr/bin/env python
import os
import sys
import json
import time
import boto
import boto.s3.connection
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='Delayed download.')
parser.add_argument('credentials', type=argparse.FileType('r'),
help='Credentials file.')
parser.add_argument('endpoint')
parser.add_argument('bucket')
parser.add_argument('key')
args = parser.parse_args()
credentials= json.load(args.credentials)[args.endpoint]
conn = boto.connect_s3(
aws_access_key_id = credentials.get('access_key'),
aws_secret_access_key = credentials.get('secret_key'),
host = credentials.get('host'),
port = credentials.get('port'),
is_secure = credentials.get('is_secure',False),
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
key = conn.get_bucket(args.bucket).get_key(args.key)
key.BufferSize = 1048576
key.open_read(headers={})
time.sleep(120)
key.get_contents_to_file(sys.stdout)
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
The format of the credentials file is just standard::
=============================================
=============================================
{
"cluster": {
"access_key": "blahblahblah",
"secret_key": "blahblahblah",
"host": "blahblahblah",
"port": "443",
"is_secure": true
}
}
=============================================
=============================================
From here your object will almost always be truncated to whatever the
gateway has cached in the time after the initial key request.
This can be a huge issue as if the radosgw or cluster is tasked some
requests can be minutes long. You can end up grabbing the rest of the
object by doing a range request against the gateway so I know the data
is intact but I don't think the radosgw should be acting as if the
download is completed successfully and I think it should instead return
an error of some kind if it can no longer service the request.
We are using hammer (ceph version 0.94.2
(5fb85614ca8f354284c713a2f9c610860720bbf3)) and using civetweb as our
gateway.
This is on a 3 node test cluster but I have tried on our larger cluster
with the same behavior. If I can provide any other information please
let me know.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com