See responds below
Kyrylo Shatskyy
--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nuremberg
Germany
On
Wed, Oct 16, 2019 at 2:39 PM kyr <kshatskyy@xxxxxxx>
wrote:
I hope the nathans fix can probably do the thing, however it does not cover the the log referenced in the description of https://tracker.ceph.com/issues/42313 because
teuthology worker does not include that fix which is supposed to be cause for "No space left on device" issue.
I'm
not quite sure what you mean here. I think one of these addresses
your
statement?
when you go to the ticket, you can see a reference to the teutholog in the bug description:
You can easily figure out the teuthology version is used there:
2019-10-09T20:24:23.144 INFO:root:teuthology version: 1.0.0-139780c
Which means there teuthology code base is used with git sha 139780c, which means it does not include any 41a13eca.
teuthology> git log --oneline master | grep 41a13ec || echo ABSENT
41a13eca misc: use remote.sh instead of remote.run
teuthology> git log --oneline 139780c | grep 41a13ec || echo ABSENT
ABSENT
As a confirmation of this, there is in the log:
ls -l '/dev/disk/by-id/wwn-*‘
That is misguiding, so probably it is just a coincidence? Or we are trying to resolving another issue which has similar no space left on device errors.
1)
we were creating very small OSDs on the root device since the
partitions
weren't being mounted, and so these jobs actually filled
them
up as a consequence of that.
2)
most of the teuthology repo is pulled fresh from master on every
run.
The workers themselves require restarting to get updates but
that's
pretty rare. (See
https://github.com/ceph/teuthology/blob/master/teuthology/worker.py#L82)
I meant not the worker.py instance running in the background, but the worker user in the opposite to scheduler user. And of course the worker.py script checkouts the teuthology branch whichever you request, and this version (corresponding sha1) is printed
in the teuthology.log.
Can some one give one-job teuthology-suite command that 100% reproducing the issue?
Kyrylo Shatskyy
--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nuremberg
Germany
On Oct 16, 2019, at 11:14 PM, Nathan Cutler <ncutler@xxxxxxxx> wrote:
On Wed, Oct 16, 2019 at 12:43:32PM -0700, Gregory Farnum wrote:
On Wed, Oct 16, 2019 at 12:24 PM David Galloway <dgallowa@xxxxxxxxxx> wrote:
Yuri just reminded me that he's seeing this problem on the mimic branch.
Does that mean this PR just needs to be backported to all branches?
https://github.com/ceph/ceph/pull/30792
I'd be surprised if that one (changing iteritems() to items()) could
cause this, and it's not a fix for any known bugs, just ongoing py3
work.
When I said "that commit" I was referring to
https://github.com/ceph/teuthology/commit/41a13eca480e38cfeeba7a180b4516b90598c39b,
which is in the teuthology repo and thus hits every test run. Looking
at the comments across https://github.com/ceph/teuthology/pull/1332
and https://tracker.ceph.com/issues/42313 it sounds like that
teuthology commit accidentally fixed a bug which triggered another bug
that we're not sure how to resolve, but perhaps I'm misunderstanding?
I think I understand what's going on. Here's an interim fix:
https://github.com/ceph/teuthology/pull/1334
Assuming this PR really does fix the issue, the "real" fix will be to drop
get_wwn_id_map altogether, since it has long outlived its usefulness ( see
https://tracker.ceph.com/issues/14855 ).
Nathan
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev
mailing list -- dev@xxxxxxx
To
unsubscribe send an email to dev-leave@xxxxxxx
|
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx