RE: rados bench object not correct errors on v9.0.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Dałek, Piotr [mailto:Piotr.Dalek@xxxxxxxxxxxxxx]
> Sent: Wednesday, August 26, 2015 2:02 AM
> To: Sage Weil; Deneau, Tom
> Cc: ceph-devel@xxxxxxxxxxxxxxx; ceph-users@xxxxxxxx
> Subject: RE: rados bench object not correct errors on v9.0.3
> 
> > -----Original Message-----
> > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
> > Sent: Tuesday, August 25, 2015 7:43 PM
> 
> > > I have built rpms from the tarball http://ceph.com/download/ceph-
> > 9.0.3.tar.bz2.
> > > Have done this for fedora 21 x86_64 and for aarch64.  On both
> > > platforms when I run a single node "cluster" with a few osds and run
> > > rados bench read tests (either seq or rand) I get occasional reports
> > > like
> > >
> > > benchmark_data_myhost_20729_object73 is not correct!
> > >
> > > I never saw these with similar rpm builds on these platforms from
> > > 9.0.2
> > sources.
> > >
> > > Also, if I go to an x86-64 system running Ubuntu trusty for which I
> > > am able to install prebuilt binary packages via
> > >     ceph-deploy install --dev v9.0.3
> > >
> > > I do not see the errors there.
> >
> > Hrm.. haven't seen it on this end, but we're running/testing master
> > and not
> > 9.0.2 specifically.  If you can reproduce this on master, that'd be very
> helpful!
> >
> > There have been some recent changes to rados bench... Piotr, does this
> > seem like it might be caused by your changes?
> 
> Yes. My PR #4690 (https://github.com/ceph/ceph/pull/4690) caused rados bench
> to be fast enough to sometimes run into race condition between librados's AIO
> and objbencher processing. That was fixed in PR #5152
> (https://github.com/ceph/ceph/pull/5152) which didn't make it into 9.0.3.
> Tom, you can confirm this by inspecting the contents of objects questioned
> (their contents should be perfectly fine and I in line with other objects).
> In the meantime you can either apply patch from PR #5152 on your own or use -
> -no-verify.
> 
> With best regards / Pozdrawiam
> Piotr Dałek

Piotr --

Thank you.  Yes, when I looked at the contents of the objects they always
looked correct.  And yes a single object would sometimes report an error
and sometimes not.  So a race condition makes sense.

A couple of questions:

   * Why would I not see this behavior using the pre-built 9.0.3 binaries
     that get installed using "ceph-deploy install --dev v9.0.3"?  I would assume
     this is built from the same sources as the 9.0.3 tarball.

   * So I assume one should not compare pre 9.0.3 rados bench numbers with 9.0.3 and after?
     The pull request https://github.com/ceph/ceph/pull/4690 did not mention the
     effect on final bandwidth numbers, did you notice what that effect was?

-- Tom

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux