Re: "No space left on device" errors

kyr <kshatskyy@xxxxxxx> · Thu, 17 Oct 2019 01:35:56 +0200

So I ran a job on smithi against teuthology code which is supposed to have "No space left on device":
http://qa-proxy.ceph.com/teuthology/kyr-2019-10-16_22:55:36-smoke:basic-master-distro-basic-smithi/4416887/teuthology.log

And it passed, has not this issue. Which exact suite does reproduce the issue?

Kyrylo Shatskyy
--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nuremberg
Germany

On Oct 17, 2019, at 12:35 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

On Wed, Oct 16, 2019 at 2:39 PM kyr <kshatskyy@xxxxxxx> wrote:

I hope the nathans fix can probably do the thing, however it does not cover the the log referenced in the description of  https://tracker.ceph.com/issues/42313 because teuthology worker does not include that fix which is supposed to be cause for "No space left on device" issue.

I'm not quite sure what you mean here. I think one of these addresses
your statement?
1) we were creating very small OSDs on the root device since the
partitions weren't being mounted, and so these jobs actually filled
them up as a consequence of that.
2) most of the teuthology repo is pulled fresh from master on every
run. The workers themselves require restarting to get updates but
that's pretty rare. (See
https://github.com/ceph/teuthology/blob/master/teuthology/worker.py#L82)

Can some one give one-job teuthology-suite command  that 100% reproducing the issue?

Kyrylo Shatskyy
--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nuremberg
Germany

On Oct 16, 2019, at 11:14 PM, Nathan Cutler <ncutler@xxxxxxxx> wrote:

On Wed, Oct 16, 2019 at 12:43:32PM -0700, Gregory Farnum wrote:

On Wed, Oct 16, 2019 at 12:24 PM David Galloway <dgallowa@xxxxxxxxxx> wrote:

Yuri just reminded me that he's seeing this problem on the mimic branch.

Does that mean this PR just needs to be backported to all branches?

https://github.com/ceph/ceph/pull/30792

I'd be surprised if that one (changing iteritems() to items()) could
cause this, and it's not a fix for any known bugs, just ongoing py3
work.

When I said "that commit" I was referring to
https://github.com/ceph/teuthology/commit/41a13eca480e38cfeeba7a180b4516b90598c39b,
which is in the teuthology repo and thus hits every test run. Looking
at the comments across https://github.com/ceph/teuthology/pull/1332
and https://tracker.ceph.com/issues/42313 it sounds like that
teuthology commit accidentally fixed a bug which triggered another bug
that we're not sure how to resolve, but perhaps I'm misunderstanding?

I think I understand what's going on. Here's an interim fix:
https://github.com/ceph/teuthology/pull/1334

Assuming this PR really does fix the issue, the "real" fix will be to drop
get_wwn_id_map altogether, since it has long outlived its usefulness ( see
https://tracker.ceph.com/issues/14855 ).

Nathan
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx