Re: Corrupted files on CephFS since Luminous upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The problem is : only a single client accesses each file !

It is not related to a multiple clients accessing a same file at the
same time, at all.

It seems my problem is gone when I set fuse_disable_pagecache to true,
only on the client accessing this file.

Is it possible a corruption occurs on CephFS ? I never had this problem
before my Luminous upgrade ! Nothing changed on mail servers (no Dovecot
upgrade).


On 11/12/2017 15:25, Denes Dolhay wrote:
> Hi,
>
>
> The ceph mds keeps all the capabilities for the files, however the
> clients modify the the rados data pool objects directly (they do not
> do the content modification threw the mds).
>
> IMHO IF the file (really) gets corrupted because a client write (not
> some corruption from the mds / osd) then it can only happen if:
>
> -The first client does not request write cap (lock) for that object
> before write
>
> -OR the MDS does not store that write cap
>
> -OR the MDS does not return the cap to the second client or refuse the
> write for the second concurrent client
>
> -OR the second client does not (request a write cap / check existing
> caps / does not obey result for write request denial from the mds)
>
> -OR any of the clients write incorrect data based on obsolete object
> cache caused by missing / faulty cache eviction (Is this even possible?)
>
> *Please correct me if I am wrong in any of the above!!*
>
>
> If I were in your shoe, first I would test the locking of the cephfs
> by writing two test scripts:
>
> -One would constantly append to a file (like an SMTP server does a
> mailbox)
>
> -The other would modify / add / delete parts o this file (like a imap
> server does)
>
> And wait for corruption to occur
>
>
> One other thing, it would be interesting to see what the corruption
> really looks like for example partially overwritten lines?
>
> It would be interesting to know at what part of the file is the
> corruption in? beginning? end? %?
>
> Aaand if there were a mailbox compact around the corruption
>
>
> Kind regards,
>
> Denes.
>
>
>
> On 12/11/2017 10:33 AM, Florent B wrote:
>> On 08/12/2017 14:59, Ronny Aasen wrote:
>>> On 08. des. 2017 14:49, Florent B wrote:
>>>> On 08/12/2017 14:29, Yan, Zheng wrote:
>>>>> On Fri, Dec 8, 2017 at 6:51 PM, Florent B <florent@xxxxxxxxxxx>
>>>>> wrote:
>>>>>> I don't know I didn't touched that setting. Which one is
>>>>>> recommended ?
>>>>>>
>>>>>>
>>>>> If multiple dovecot instances are running at the same time and they
>>>>> all modify the same files. you need to set fuse_disable_pagecache to
>>>>> true.
>>>> Ok, but in my configuration, each mail user is mapped to a single
>>>> server.
>>>> So files are accessed only by a single server at a time.
>>>
>>> how about mail delivery ? if you use dovecot deliver a delivery can
>>> occur (and rewrite dovecot index/cache) at the same time as a user
>>> accesses imap and writes to dovecot index/cache.
>>>
>> Ok why not, but I never had problem like this with previous version of
>> Ceph. I will try fuse_disable_pagecache...
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux