rgw : inefficient bucket listing with max-keys URL parameter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I came across an issue in rgw where bucket listing turns out to be
quite slow when a low value is specified to the URL parameter
<max-keys>. For example, the s3a hadoop connector species <max-keys>
parameter to be 1 for a certain operation [1]. The expectation of the
client here is to get result set with 1 value. Radosgw, in turn,
percolates this value down to rados and fetches a key one by one in
RGWRados::Bucket::List::list_objects function [2] before checking for
delimiter etc. This turns out to be highly inefficient and thus the
client faces time-outs.

It would have been better if a config option could be provided to
avoid such issues, like, a minimum readahead value for listing objects
from rados.

I have raised a PR for the above mentioned fix [3].

[1] https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1012
[2] https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L4729
[3] https://github.com/ceph/ceph/pull/8756

Thanks
Abhishek
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux