Re: cephfs miss data for 15s when master mds rebooting

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 18 Dec 2017 10:59:15 +0800

On Mon, Dec 18, 2017 at 10:10 AM, 13605702596@xxxxxxx
<13605702596@xxxxxxx> wrote:
> hi Yan
>
> 1. run "ceph mds fail" before rebooting host
> 2. host reboot by itself for some reason
>
> you means no data get lost in the  BOTH conditions?
>
> in my test, i echo the date string per second into the file under cephfs
> dir,
> when i reboot the master mds, there are 15 lines got lost.
>

what do you mean 15 line got lost? are you sure it's not caused by write stall?

> thanks
>
> ________________________________
> 13605702596@xxxxxxx
>
>
> From: Yan, Zheng
> Date: 2017-12-18 09:55
> To: 13605702596@xxxxxxx
> CC: John Spray; ceph-users
> Subject: Re:  cephfs miss data for 15s when master mds rebooting
> On Mon, Dec 18, 2017 at 9:24 AM, 13605702596@xxxxxxx
> <13605702596@xxxxxxx> wrote:
>> hi John
>>
>> thanks for your answer.
>>
>> in normal condition, i can run  "ceph mds fiail" before reboot.
>> but if the host reboots by itself for some reason, i can do nothing!
>> if this happens, data must be losed.
>>
>> so, is there any other way to stop data from being losed?
>>
>
> no data get lost in this condition.  just IO stall for a few seconds
>
>> thanks
>>
>> ________________________________
>> 13605702596@xxxxxxx
>>
>>
>> From: John Spray
>> Date: 2017-12-15 18:08
>> To: 13605702596@xxxxxxx
>> CC: ceph-users
>> Subject: Re:  cephfs miss data for 15s when master mds
>> rebooting
>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702596@xxxxxxx
>> <13605702596@xxxxxxx> wrote:
>>> hi
>>>
>>> i used 3 nodes to deploy mds (each node also has mon on it)
>>>
>>> my config:
>>> [mds.ceph-node-10-101-4-17]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-21]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> [mds.ceph-node-10-101-4-22]
>>> mds_standby_replay = true
>>> mds_standby_for_rank = 0
>>>
>>> the mds stat:
>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay, 1
>>> up:standby
>>>
>>> i mount the cephfs on the ceph client, and run the test script to write
>>> data
>>> into file under the cephfs dir,
>>> when i reboot the master mds, and i found the data is not written into
>>> the
>>> file.
>>> after 15 seconds, data can be written into the file again
>>>
>>> so my question is:
>>> is this normal when reboot the master mds?
>>> when will the up:standby-replay mds take over the the cephfs?
>>
>> The standby takes over after the active daemon has not reported to the
>> monitors for `mds_beacon_grace` seconds, which as you have noticed is
>> 15s by default.
>>
>> If you know you are rebooting something, you can pre-empt the timeout
>> mechanism by using "ceph mds fail" on the active daemon, to cause
>> another to take over right away.
>>
>> John
>>
>>> thanks
>>>
>>> ________________________________
>>> 13605702596@xxxxxxx
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com