Thanks for responding! It's good to hear that the primary OSD has some smarts when dealing with partial reads, and that seems to line up with what I was seeing, i.e. I would have expected drastically worse performance otherwise with our large object sizes and tiny block sizes. I'm am still seeing some performance degradation with the small block sizes, but I guess that is coming from the inefficiencies of lots of small requests (time spent queuing for the PG etc.), rather than anything related to EC. Cheers, Tom > -----Original Message----- > From: Gregory Farnum <gfarnum@xxxxxxxxxx> > Sent: 09 September 2019 23:25 > To: Byrne, Thomas (STFC,RAL,SC) <tom.byrne@xxxxxxxxxx> > Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Subject: Re: Help understanding EC object reads > > On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC > <tom.byrne@xxxxxxxxxx> wrote: > > > > Hi all, > > > > I’m investigating an issue with our (non-Ceph) caching layers of our large EC > cluster. It seems to be turning users requests for whole objects into lots of > small byte range requests reaching the OSDs, but I’m not sure how > inefficient this behaviour is in reality. > > > > My limited understanding of an EC object partial read is that the entire > object is reconstructed on the primary OSD, and then the requested byte > range is sent to the client before the primary discards the reconstructed > object. > > Ah, it's not necessarily the entire object is reconstructed, but that any stripes > covering the requested range are reconstructed. It's changed a bit over time > and there are some knobs controlling it, but I believe this is generally > efficient — if you ask for a byte range which simply lives on the primary, it's > not going to talk to the other OSDs to provide that data. > > > > > Assuming this is correct, do multiple reads for different byte ranges of the > same object at effectively the same time result in the entire object being > reconstructed once for each request, or does the primary do something > clever and use the same reconstructed object for multiple requests before > discarding it? > > I'm pretty sure it's per-request; the EC pool code generally assumes you have > another cache on top of RADOS that deals with combining these requests. > There is a small cache in the OSD but IIRC it's just for keeping stuff consistent > while writes are in progress. > -Greg > > > > > If I’m completely off the mark with what is going on under the hood here, a > nudge in the right direction would be appreciated! > > > > > > > > Cheers, > > > > Tom > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com