hi Yan
> Mon Dec 18 03:07:47 UTC 2017 <-- reboot
> Mon Dec 18 03:08:05 UTC 2017 <-- mds failover works
this is caused by write stall
but the data below got lost, is this normal?
Mon Dec 18 03:07:48 UTC 2017
Mon Dec 18 03:07:49 UTC 2017
Mon Dec 18 03:07:50 UTC 2017
Mon Dec 18 03:07:51 UTC 2017
Mon Dec 18 03:07:52 UTC 2017
Mon Dec 18 03:07:53 UTC 2017
Mon Dec 18 03:07:54 UTC 2017
Mon Dec 18 03:07:55 UTC 2017
Mon Dec 18 03:07:56 UTC 2017
Mon Dec 18 03:07:57 UTC 2017
Mon Dec 18 03:07:58 UTC 2017
Mon Dec 18 03:07:59 UTC 2017
Mon Dec 18 03:08:00 UTC 2017
Mon Dec 18 03:08:01 UTC 2017
Mon Dec 18 03:08:02 UTC 2017
Mon Dec 18 03:08:03 UTC 2017
Mon Dec 18 03:08:04 UTC 2017
Mon Dec 18 03:07:49 UTC 2017
Mon Dec 18 03:07:50 UTC 2017
Mon Dec 18 03:07:51 UTC 2017
Mon Dec 18 03:07:52 UTC 2017
Mon Dec 18 03:07:53 UTC 2017
Mon Dec 18 03:07:54 UTC 2017
Mon Dec 18 03:07:55 UTC 2017
Mon Dec 18 03:07:56 UTC 2017
Mon Dec 18 03:07:57 UTC 2017
Mon Dec 18 03:07:58 UTC 2017
Mon Dec 18 03:07:59 UTC 2017
Mon Dec 18 03:08:00 UTC 2017
Mon Dec 18 03:08:01 UTC 2017
Mon Dec 18 03:08:02 UTC 2017
Mon Dec 18 03:08:03 UTC 2017
Mon Dec 18 03:08:04 UTC 2017
13605702596@xxxxxxx
From: Yan, ZhengDate: 2017-12-18 11:27CC: John Spray; ceph-usersSubject: Re: Re: cephfs miss data for 15s when master mds rebootingOn Mon, Dec 18, 2017 at 11:11 AM, 13605702596@xxxxxxx<13605702596@xxxxxxx> wrote:> hi Yan>> my test script:>> #!/bin/sh>> rm -f /root/cephfs/time.txt>> while true> do> echo `date` >> /root/cephfs/time.txt> sync> sleep 1> done>> i run this scripte and then reboot master mds>> from the file /root/cephfs/time.txt, i can see there are more than 15 lines> got lost:> Mon Dec 18 03:07:43 UTC 2017> Mon Dec 18 03:07:44 UTC 2017> Mon Dec 18 03:07:45 UTC 2017> Mon Dec 18 03:07:47 UTC 2017 <-- reboot> Mon Dec 18 03:08:05 UTC 2017 <-- mds failover worksthis is caused by write stall> Mon Dec 18 03:08:06 UTC 2017> Mon Dec 18 03:08:07 UTC 2017> Mon Dec 18 03:08:08 UTC 2017> Mon Dec 18 03:08:09 UTC 2017> Mon Dec 18 03:08:10 UTC 2017>> ________________________________> 13605702596@xxxxxxx>>> From: Yan, Zheng> Date: 2017-12-18 10:59> To: 13605702596@xxxxxxx> CC: John Spray; ceph-users> Subject: Re: Re: [ceph-users] cephfs miss data for 15s when master mds> rebooting> On Mon, Dec 18, 2017 at 10:10 AM, 13605702596@xxxxxxx> <13605702596@xxxxxxx> wrote:>> hi Yan>>>> 1. run "ceph mds fail" before rebooting host>> 2. host reboot by itself for some reason>>>> you means no data get lost in the BOTH conditions?>>>> in my test, i echo the date string per second into the file under cephfs>> dir,>> when i reboot the master mds, there are 15 lines got lost.>>>> what do you mean 15 line got lost? are you sure it's not caused by write> stall?>>>> thanks>>>> ________________________________>> 13605702596@xxxxxxx>>>>>> From: Yan, Zheng>> Date: 2017-12-18 09:55>> To: 13605702596@xxxxxxx>> CC: John Spray; ceph-users>> Subject: Re: cephfs miss data for 15s when master mds>> rebooting>> On Mon, Dec 18, 2017 at 9:24 AM, 13605702596@xxxxxxx>> <13605702596@xxxxxxx> wrote:>>> hi John>>>>>> thanks for your answer.>>>>>> in normal condition, i can run "ceph mds fiail" before reboot.>>> but if the host reboots by itself for some reason, i can do nothing!>>> if this happens, data must be losed.>>>>>> so, is there any other way to stop data from being losed?>>>>>>> no data get lost in this condition. just IO stall for a few seconds>>>>> thanks>>>>>> ________________________________>>> 13605702596@xxxxxxx>>>>>>>>> From: John Spray>>> Date: 2017-12-15 18:08>>> To: 13605702596@xxxxxxx>>> CC: ceph-users>>> Subject: Re: cephfs miss data for 15s when master mds>>> rebooting>>> On Fri, Dec 15, 2017 at 1:45 AM, 13605702596@xxxxxxx>>> <13605702596@xxxxxxx> wrote:>>>> hi>>>>>>>> i used 3 nodes to deploy mds (each node also has mon on it)>>>>>>>> my config:>>>> [mds.ceph-node-10-101-4-17]>>>> mds_standby_replay = true>>>> mds_standby_for_rank = 0>>>>>>>> [mds.ceph-node-10-101-4-21]>>>> mds_standby_replay = true>>>> mds_standby_for_rank = 0>>>>>>>> [mds.ceph-node-10-101-4-22]>>>> mds_standby_replay = true>>>> mds_standby_for_rank = 0>>>>>>>> the mds stat:>>>> e29: 1/1/1 up {0=ceph-node-10-101-4-22=up:active}, 1 up:standby-replay,>>>> 1>>>> up:standby>>>>>>>> i mount the cephfs on the ceph client, and run the test script to write>>>> data>>>> into file under the cephfs dir,>>>> when i reboot the master mds, and i found the data is not written into>>>> the>>>> file.>>>> after 15 seconds, data can be written into the file again>>>>>>>> so my question is:>>>> is this normal when reboot the master mds?>>>> when will the up:standby-replay mds take over the the cephfs?>>>>>> The standby takes over after the active daemon has not reported to the>>> monitors for `mds_beacon_grace` seconds, which as you have noticed is>>> 15s by default.>>>>>> If you know you are rebooting something, you can pre-empt the timeout>>> mechanism by using "ceph mds fail" on the active daemon, to cause>>> another to take over right away.>>>>>> John>>>>>>> thanks>>>>>>>> ________________________________>>>> 13605702596@xxxxxxx>>>>>>>> _______________________________________________>>>> ceph-users mailing list>>>> ceph-users@xxxxxxxxxxxxxx>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>>>>>>>>>>>> _______________________________________________>>> ceph-users mailing list>>> ceph-users@xxxxxxxxxxxxxx>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com