Re: Help understanding EC object reads

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 9 Sep 2019 15:25:25 -0700

On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC
<tom.byrne@xxxxxxxxxx> wrote:
>
> Hi all,
>
> I’m investigating an issue with our (non-Ceph) caching layers of our large EC cluster. It seems to be turning users requests for whole objects into lots of small byte range requests reaching the OSDs, but I’m not sure how inefficient this behaviour is in reality.
>
> My limited understanding of an EC object partial read is that the entire object is reconstructed on the primary OSD, and then the requested byte range is sent to the client before the primary discards the reconstructed object.

Ah, it's not necessarily the entire object is reconstructed, but that
any stripes covering the requested range are reconstructed. It's
changed a bit over time and there are some knobs controlling it, but I
believe this is generally efficient — if you ask for a byte range
which simply lives on the primary, it's not going to talk to the other
OSDs to provide that data.

>
> Assuming this is correct, do multiple reads for different byte ranges of the same object at effectively the same time result in the entire object being reconstructed once for each request, or does the primary do something clever and use the same reconstructed object for multiple requests before discarding it?

I'm pretty sure it's per-request; the EC pool code generally assumes
you have another cache on top of RADOS that deals with combining these
requests.
There is a small cache in the OSD but IIRC it's just for keeping stuff
consistent while writes are in progress.
-Greg

>
> If I’m completely off the mark with what is going on under the hood here, a nudge in the right direction would be appreciated!
>
>
>
> Cheers,
>
> Tom
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com