Re: Inconsistent metadata seen by CephFS-fuse clients

"Yan, Zheng" <ukernel@xxxxxxxxx> · Sat, 28 Apr 2018 08:50:46 +0800

On Fri, Apr 27, 2018 at 11:49 PM, Oliver Freyermuth
<freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
> Dear Yan Zheng,
>
> Am 27.04.2018 um 15:32 schrieb Yan, Zheng:
>> On Fri, Apr 27, 2018 at 7:10 PM, Oliver Freyermuth
>> <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>>> Dear Yan Zheng,
>>>
>>> Am 27.04.2018 um 02:58 schrieb Yan, Zheng:
>>>> On Thu, Apr 26, 2018 at 10:00 PM, Oliver Freyermuth
>>>> <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>>>>> Dear Cephalopodians,
>>>>>
>>>>> just now that our Ceph cluster is under high I/O load, we get user reports of files not being seen on some clients,
>>>>> but somehow showing up after forcing a stat() syscall.
>>>>>
>>>>> For example, one user had added several files to a directory via an NFS client attached to nfs-ganesha (which uses libcephfs),
>>>>> and afterwards, all other nfs-ganesha servers saw it, and 44 of our Fuse-clients -
>>>>> but one single client still saw the old contents of the directory, i.e. the files seemed missing(!).
>>>>> This happened both when using "ls" on the directory or when trying to access the non-existent files directly.
>>>>>
>>>>> I could confirm this observation also in a fresh login shell on the machine.
>>>>>
>>>>> Then, on the "broken" client, I entered in the directory which seemed to contain only the "old" content, and I created a new file in there.
>>>>> This worked fine, and all other clients saw the file immediately.
>>>>> Also on the broken client, metadata was now updated and all other files appeared - i.e. everything was "in sync" again.
>>>>>
>>>>> There's nothing in the ceph-logs of our MDS, or in the syslogs of the client machine / MDS.
>>>>>
>>>>>
>>>>> Another user observed the same, but not explicitly limited to one machine (it seems random).
>>>>> He now uses a "stat" on the file he expects to exist (but which is not seen with "ls").
>>>>> The stat returns "No such file", a subsequent "ls" then however lists the file, and it can be accessed normally.
>>>>>
>>>>> This feels like something is messed up concerning the client caps - these are all 12.2.4 Fuse clients.
>>>>>
>>>>> Any ideas how to find the cause?
>>>>> It only happens since recently, and under high I/O load with many metadata operations.
>>>>>
>>>>
>>>> Sounds like bug in readdir cache. Could you try the attached patch.
>>>
>>> Many thanks for the quick response and patch!
>>> The problem is to try it out. We only observe this issue on our production cluster, randomly, especially during high load, and only after is has been running for a few days.
>>> We don't have a test Ceph cluster available of similar size and with similar load. I would not like to try out the patch on our production system.
>>>
>>> Can you extrapolate from the bugfix / patch what's the minimal setup needed to reproduce / trigger the issue?
>>> Then we may look into setting up a minimal test setup to check whether the issue is resolved.
>>>
>>> All the best and many thanks,
>>>         Oliver
>>>
>>
>> I think this is libcephfs version of
>> http://tracker.ceph.com/issues/20467. I forgot to write patch for
>> libcephfs, Sorry. To reproduce this,  write a program that call
>> getdents(2) in a loop. Add artificially delay to the loop, make the
>> program iterates whole directory in about ten seconds. Run several
>> instance of the program simultaneously on a large directory. Also make
>> client_cache_size a little smaller than the size of directory.
>
> This is strange - in case 1 where our users observed the issue,
> the affected directory contained exactly 1 file, which some clients saw and others did not.
> In case 2, the affected directory contained about 5 files only.
>
> Of course, we also have directories with many (thousands) of files in our CephFS, and they may be accessed in parallel.
> Also, we run a massive number of parallel programs (about 2000) accessing the FS via about 40 clients.
>
> 1. Could this still be the same issue?
> 2. Many thanks for the repro-instructions. It seems, however, this would require quite an amount of time,
>    since we don't have a separate "test" instance at hand (yet) and are not experts on the field.
>    We could try, but it won't be fast... And meybe it's nicer to have something like this in the test suite, if possible.
>

This issue should be caused by different bug. I will check code.

Regards
Yan, Zheng

> Potentially, it's even faster to get the fix in the next patch release, if it's clear this can not have bad side effects.
>
> Also, should we transfer this information to a ticket?
>
> Cheers and many thanks,
>         Oliver
>
>>
>> Regards
>> Yan, Zheng
>>
>>>
>>>>
>>>> Regards
>>>> Yan, Zheng
>>>>
>>>>
>>>>> Cheers,
>>>>>         Oliver
>>>>>
>>>>>
>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com