Hello. I have been kicking the tires with Ceph using the librados API and observed some peculiar object access patterns when reading a portion of the object (as opposed to the whole object). First I want to offer some background. My use case requires that I use erasure coded pools and store large objects (100 MiB - 1 GiB) meant to be written/read sequentially for optimal performance. However, I also want to be able to read smaller ranges of these large objects (say 5-10 MiB) at a time efficiently (ranged reads). I am trying to figure out the optimal configuration to use for my erasure coded pool settings that results in the maximum possible read throughput for the cluster (as opposed to lowest latency for any given ranged read). I hope to achieve this by minimizing the number of distinct OSDs that need to be read from for these ranged reads which should minimize the number of disk seeks needed to return data (on these OSDs), I should be able to get better tail latencies for reads under high contention. Intuitively, I should be able to achieve this by using large chunk sizes. Concretely, if I had my EC pool settings have k=10, m=3 and I had the stripe_unit (or chunk size) set to 1 MiB, then reading the first 5 MiB range of a large object should only read from the first 5 OSDs that contain the object. However, I have observed (using blktrace on all the OSDs that make up the pool) that reads are being issued to all of the k=10 OSDs and the amount of data being read on each OSD is equal to the chunk size. This seems weird because even though I only care about the first 5 MiB of the data that can be read back from the first 5 OSDs, rados seems to be issuing reads for the entire stripe of 10 MiB. This can be wasteful under load. So my question is if this is by design. Specifically, is it a requirement that rados issue reads for an entire stripe even though only a portion of it is requested to be read? Is this behavior configurable? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx