Re: Are we ready for 10.2.8 - core

Josh Durgin <jdurgin@xxxxxxxxxx> · Fri, 30 Jun 2017 07:38:44 -0700

On 06/30/2017 05:21 AM, Nathan Cutler wrote:
Hi again Josh:

Here comes the recap of 10.2.8 status. All requested PRs have been
merged, so I ran a "rados" suite and an "upgrade/hammer-x" suite. There
were some failures, but most were obvious infrastructure noise and
disappeared on re-run. There were also two failures that stood out and
seemed like they might be real bugs:

[1] rados issue: http://tracker.ceph.com/issues/20449
[2] upgrade issue: http://tracker.ceph.com/issues/13381

Regarding [1], my initial analysis was wrong. The real cause of the
failure is a transient gevent/greenlet timeout; happens in about 40% of
the runs.

Regarding [2], Sage analyzed the initial failure in the tracker. I
re-ran the test on both smithi and vps with the following results:

* on vps, both jobs passed
* on smithi, one job passed and the other failed. However, the failure
was for a different reason ("ceph-objectstore-tool: exp list-pgs failure
with status 1") - see
http://pulpito.front.sepia.ceph.com/smithfarm-2017-06-30_09:08:18-upgrade:hammer-x-wip-jewel-backports-distro-basic-smithi/

* http://tracker.ceph.com/issues/13381 was not reproduced

What other testing do you think is needed before we send 10.2.8 to QE?

What you've run already looks sufficient - I was worried about 13381
before, but it does not seem related to the 10.2.8 backports at this
point - just a pre-existing rare race. So I'd say it's ready for QE.

Thanks!
Josh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html