Re: "No space left on device" errors

Jason Dillaman <jdillama@xxxxxxxxxx> · Wed, 16 Oct 2019 17:53:42 -0400

On Wed, Oct 16, 2019 at 5:39 PM kyr <kshatskyy@xxxxxxx> wrote:
>
> I hope the nathans fix can probably do the thing, however it does not cover the the log referenced in the description of  https://tracker.ceph.com/issues/42313 because teuthology worker does not include that fix which is supposed to be cause for "No space left on device" issue.

If teuthology is picking the wrong devices for the past week (as
described by Nathan's PR), why wouldn't you expect that would impact
OSDs running out of space?

> Can some one give one-job teuthology-suite command  that 100% reproducing the issue?

I believe that's the goal to test Nathan's change against the RBD
suite since it has been hitting it across all branches since the end
of last week at least.

The logs prior to this breakage show ...

2019-10-02T21:37:45.198 INFO:tasks.ceph:fs option selected, checking
for scratch devs
2019-10-02T21:37:45.199 INFO:tasks.ceph:found devs:
['/dev/vg_nvme/lv_4', '/dev/vg_nvme/lv_3', '/dev/vg_nvme/lv_2',
'/dev/vg_nvme/lv_1']
2019-10-02T21:37:45.199 INFO:teuthology.orchestra.run.smithi197:Running:
2019-10-02T21:37:45.199 INFO:teuthology.orchestra.run.smithi197:> ls
-l '/dev/disk/by-id/wwn-*'
2019-10-02T21:37:45.265
INFO:teuthology.orchestra.run.smithi197.stderr:ls: cannot access
/dev/disk/by-id/wwn-*: No such file or directory
2019-10-02T21:37:45.265 DEBUG:teuthology.orchestra.run:got remote
process result: 2
2019-10-02T21:37:45.266 INFO:teuthology.misc:Failed to get wwn
devices! Using /dev/sd* devices...

... and after ...

2019-10-10T19:42:32.759 INFO:tasks.ceph:fs option selected, checking
for scratch devs
2019-10-10T19:42:32.759 INFO:tasks.ceph:found devs:
['/dev/vg_nvme/lv_4', '/dev/vg_nvme/lv_3', '/dev/vg_nvme/lv_2',
'/dev/vg_nvme/lv_1']
2019-10-10T19:42:32.759 INFO:teuthology.orchestra.run.smithi177:Running:
2019-10-10T19:42:32.759 INFO:teuthology.orchestra.run.smithi177:> ls
-l /dev/disk/by-id/wwn-*
2019-10-10T19:42:32.813
INFO:teuthology.orchestra.run.smithi177.stdout:lrwxrwxrwx. 1 root root
 9 Oct 10 19:33 /dev/disk/by-id/wwn-0x5000c5009294113e -> ../../sda
2019-10-10T19:42:32.813
INFO:teuthology.orchestra.run.smithi177.stdout:lrwxrwxrwx. 1 root root
10 Oct 10 19:33 /dev/disk/by-id/wwn-0x5000c5009294113e-part1 ->
../../sda1
2019-10-10T19:42:32.813 INFO:tasks.ceph:dev map: {}

... so it seems like a good candidate fix.

> Kyrylo Shatskyy
> --
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5
> 90409 Nuremberg
> Germany
>
>
> On Oct 16, 2019, at 11:14 PM, Nathan Cutler <ncutler@xxxxxxxx> wrote:
>
> On Wed, Oct 16, 2019 at 12:43:32PM -0700, Gregory Farnum wrote:
>
> On Wed, Oct 16, 2019 at 12:24 PM David Galloway <dgallowa@xxxxxxxxxx> wrote:
>
>
> Yuri just reminded me that he's seeing this problem on the mimic branch.
>
> Does that mean this PR just needs to be backported to all branches?
>
> https://github.com/ceph/ceph/pull/30792
>
>
> I'd be surprised if that one (changing iteritems() to items()) could
> cause this, and it's not a fix for any known bugs, just ongoing py3
> work.
>
> When I said "that commit" I was referring to
> https://github.com/ceph/teuthology/commit/41a13eca480e38cfeeba7a180b4516b90598c39b,
> which is in the teuthology repo and thus hits every test run. Looking
> at the comments across https://github.com/ceph/teuthology/pull/1332
> and https://tracker.ceph.com/issues/42313 it sounds like that
> teuthology commit accidentally fixed a bug which triggered another bug
> that we're not sure how to resolve, but perhaps I'm misunderstanding?
>
>
> I think I understand what's going on. Here's an interim fix:
> https://github.com/ceph/teuthology/pull/1334
>
> Assuming this PR really does fix the issue, the "real" fix will be to drop
> get_wwn_id_map altogether, since it has long outlived its usefulness ( see
> https://tracker.ceph.com/issues/14855 ).
>
> Nathan
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>
>

-- 
Jason
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx