Re: ceph-fuse (jewel 10.2.2): No such file or directory issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Greg
For now we have to wait and see if it appears again. If it does, than at least we provide a strace and perform any further debug.
We will update this thread when/ if it appears again.
Cheers
G.
________________________________________
From: Gregory Farnum [gfarnum@xxxxxxxxxx]
Sent: 29 July 2016 06:54
To: Goncalo Borges
Cc: ceph-users@xxxxxxxx
Subject: Re:  ceph-fuse (jewel 10.2.2): No such file or directory issues

On Wed, Jul 27, 2016 at 6:37 PM, Goncalo Borges
<goncalo.borges@xxxxxxxxxxxxx> wrote:
> Hi Greg
>
> Thanks for replying. Answer inline.
>
>
>
>>> Dear cephfsers :-)
>>>
>>> We saw some weirdness in cephfs that we do not understand.
>>>
>>> We were helping some user which complained that her batch system job
>>> outputs
>>> were not produced in cephfs.
>>>
>>> Please note that we are using ceph-fuse (jewel 10.2.2) as client
>>>
>>> We log in into the machine where her jobs run, and saw the following
>>> behavior:
>>>
>>> # ls /coepp/cephfs/mel/user/foo/bar/stuff
>>> ls: cannot access '/coepp/cephfs/mel/user/foo/bar/stuff': No such file or
>>> directory
>>>
>>>
>>> If we went back 1 directory, still No such file
>>>
>>> # ls /coepp/cephfs/mel/user/foo/bar
>>> ls: cannot access '/coepp/cephfs/mel/user/foo/bar': No such file or
>>> directory
>>>
>>>
>>> But if I did an ls in the user directory it was fine
>>>
>>> # ls /coepp/cephfs/mel/user
>>> ....
>>>
>>> And then trying to ls to the directories which failed previous worked
>>> fine
>>>
>>> It seems like a cache issue and I wonder if there is a way to mitigate
>>> it.
>>>
>>> It is also worthwhile to mention that this seems to happen while we are
>>> adding a new storage server to the underlying ceph infrastructure, so
>>> there
>>> was some data movement happening in the background.
>>>
>>> Any suggestion on how to mitigate it?
>>
>> If you're really using 10.2.2 and not something earlier, I don't think
>> this is a bug we've heard about. It sounds like you could work around
>> it by dropping caches or listing down from the root gratuitously, but
>> otherwise we'll need to do some debugging. Can you narrow in on what
>> makes this user's workload different from the others? Did you try
>> doing any tracing to see where the ENOENT was coming from?
>
>
> Really using 10.2.2 everywhere.
>
> To debug it a bit further we have to wait for the next time it happens. Than
> we can attach strace to the ceph-fuse process and get the information which
> you are asking for.
>
> Relative to the user workload, there is nothing special happening in those
> directories. It is just a directory used to store logs (stderr and stdout)
> from the batch system jobs.
>
> We were thinking if setting
>
>     fuse_disable_pagecache = true
>
> would actually solve the problem. In this way you force ceph-fuse to read
> directly from osds, right?!

This option prevents use of the kernel/VFS page cache, but it doesn't
do anything to Ceph's internal Client cache. There's no way of turning
that off (especially for metadata), and if actual files are missing
that's got to be where the bug is. (If disabling the page cache *does*
make a difference, definitely let us know, though!)

> We understand about the performance issues that it might imply but we are
> more concerned in having data coherence in the client.

Yeah. I imagine this must be some kind of problem with our directory
listing or completeness algorithms, which *were* recently
reimplemented — but it's passing all of our internal tests and some
very strenuous workloads that detected bugs in the old system. I don't
really have any idea what's happening here; we'd almost certainly need
detailed debug logs (20 on the client and MDS) to have any real chance
of identifying the issue. :(
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux