radosgw only delivers whats cached if latency between keyrequest and actual download is above 90s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We heavily use radosgw here for most of our work and we have seen a weird truncation issue with radosgw/s3 requests.

We have noticed that if the time between the initial "ticket" to grab the object key and grabbing the data is greater than 90 seconds the object returned is truncated to whatever RGW has grabbed/cached after the initial connection and this seems to be around 512k.

Here is some PoC. This will work on most objects I have tested mostly 1G to 5G keys in RGW::

------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
#!/usr/bin/env python

import os
import sys
import json
import time

import boto
import boto.s3.connection

if __name__ == '__main__':
    import argparse

    parser = argparse.ArgumentParser(description='Delayed download.')

    parser.add_argument('credentials', type=argparse.FileType('r'),
        help='Credentials file.')

    parser.add_argument('endpoint')
    parser.add_argument('bucket')
    parser.add_argument('key')

    args = parser.parse_args()

    credentials= json.load(args.credentials)[args.endpoint]

    conn = boto.connect_s3(
        aws_access_key_id     = credentials.get('access_key'),
        aws_secret_access_key = credentials.get('secret_key'),
        host                  = credentials.get('host'),
        port                  = credentials.get('port'),
        is_secure             = credentials.get('is_secure',False),
        calling_format        = boto.s3.connection.OrdinaryCallingFormat(),
    )

    key = conn.get_bucket(args.bucket).get_key(args.key)

    key.BufferSize = 1048576
    key.open_read(headers={})
    time.sleep(120)

    key.get_contents_to_file(sys.stdout)
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------

The format of the credentials file is just standard::

=============================================
=============================================
{
 "cluster": {
        "access_key": "blahblahblah",
        "secret_key": "blahblahblah",
        "host": "blahblahblah",
        "port": "443",
        "is_secure": true
        }
}

=============================================
=============================================


From here your object will almost always be truncated to whatever the gateway has cached in the time after the initial key request.

This can be a huge issue as if the radosgw or cluster is tasked some requests can be minutes long. You can end up grabbing the rest of the object by doing a range request against the gateway so I know the data is intact but I don't think the radosgw should be acting as if the download is completed successfully and I think it should instead return an error of some kind if it can no longer service the request.

We are using hammer (ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)) and using civetweb as our gateway.

This is on a 3 node test cluster but I have tried on our larger cluster with the same behavior. If I can provide any other information please let me know.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux