Re: Help understanding EC object reads

Thomas Byrne - UKRI STFC <tom.byrne@xxxxxxxxxx> · Mon, 16 Sep 2019 10:39:35 +0000

Thanks for responding!

It's good to hear that the primary OSD has some smarts when dealing with partial reads, and that seems to line up with what I was seeing, i.e. I would have expected drastically worse performance otherwise with our large object sizes and tiny block sizes.

I'm am still seeing some performance degradation with the small block sizes, but I guess that is coming from the inefficiencies of lots of small requests (time spent queuing for the PG etc.), rather than anything related to EC.

Cheers,
Tom

> -----Original Message-----
> From: Gregory Farnum <gfarnum@xxxxxxxxxx>
> Sent: 09 September 2019 23:25
> To: Byrne, Thomas (STFC,RAL,SC) <tom.byrne@xxxxxxxxxx>
> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  Help understanding EC object reads
> 
> On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC
> <tom.byrne@xxxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > I’m investigating an issue with our (non-Ceph) caching layers of our large EC
> cluster. It seems to be turning users requests for whole objects into lots of
> small byte range requests reaching the OSDs, but I’m not sure how
> inefficient this behaviour is in reality.
> >
> > My limited understanding of an EC object partial read is that the entire
> object is reconstructed on the primary OSD, and then the requested byte
> range is sent to the client before the primary discards the reconstructed
> object.
> 
> Ah, it's not necessarily the entire object is reconstructed, but that any stripes
> covering the requested range are reconstructed. It's changed a bit over time
> and there are some knobs controlling it, but I believe this is generally
> efficient — if you ask for a byte range which simply lives on the primary, it's
> not going to talk to the other OSDs to provide that data.
> 
> >
> > Assuming this is correct, do multiple reads for different byte ranges of the
> same object at effectively the same time result in the entire object being
> reconstructed once for each request, or does the primary do something
> clever and use the same reconstructed object for multiple requests before
> discarding it?
> 
> I'm pretty sure it's per-request; the EC pool code generally assumes you have
> another cache on top of RADOS that deals with combining these requests.
> There is a small cache in the OSD but IIRC it's just for keeping stuff consistent
> while writes are in progress.
> -Greg
> 
> >
> > If I’m completely off the mark with what is going on under the hood here, a
> nudge in the right direction would be appreciated!
> >
> >
> >
> > Cheers,
> >
> > Tom
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com