Re: [ CEPH ANSIBLE FAILOVER TESTING ] Ceph Native Driver issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Reed,
Thankyou so much  for the input and support. We have tried using the variable suggested by you, but could not see any impact on the current system. 
"ceph fs set cephfs allow_standby_replay true " it did not create any impact in the failover time

Furthermore we have tried more scenarios that we tested using our test :
scenario 1:
image.png
  • In this we have tried to see the logs at the new node on which the mds will failover to, i.e in this case if we reboot cephnode2 so new active MDS will be Cephnode1. Checking logs for cephnode1 in two scenarios:
  • 1. normal reboot of Cephnode2 by keeping the I/O operation in progress,
    • we see that log at cephnode1 instantiates immediately and then wait for sometime (around 15 seconds for some beacon time) + some additional 6-7 seconds during which it activated MDS on cephnode1 and resumes I/O. Refer logs as :
    • 2021-04-29T15:49:42.480+0530 7fa747690700  1 mds.cephnode1 Updating MDS map to version 505 from mon.2
      2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505 handle_mds_map i am now mds.0.505
      2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505 handle_mds_map state change up:boot --> up:replay
      2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505 replay_start
      2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505  recovery set is
      2021-04-29T15:49:42.482+0530 7fa747690700  1 mds.0.505  waiting for osdmap 486 (which blacklists prior instance)
      2021-04-29T15:49:55.686+0530 7fa74568c700  1 mds.beacon.cephnode1 MDS connection to Monitors appears to be laggy; 15.9769s since last acked beacon
      2021-04-29T15:49:55.686+0530 7fa74568c700  1 mds.0.505 skipping upkeep work because connection to Monitors appears laggy
      2021-04-29T15:49:57.533+0530 7fa749e95700  0 mds.beacon.cephnode1  MDS is no longer laggy
      2021-04-29T15:49:59.599+0530 7fa740e83700  0 mds.0.cache creating system inode with ino:0x100
      2021-04-29T15:49:59.599+0530 7fa740e83700  0 mds.0.cache creating system inode with ino:0x1
      2021-04-29T15:50:00.456+0530 7fa73f680700  1 mds.0.505 Finished replaying journal
      2021-04-29T15:50:00.456+0530 7fa73f680700  1 mds.0.505 making mds journal writeable
      2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.cephnode1 Updating MDS map to version 506 from mon.2
      2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 handle_mds_map i am now mds.0.505
      2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 handle_mds_map state change up:replay --> up:reconnect
      2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 reconnect_start
      2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.505 reopen_log
      2021-04-29T15:50:00.959+0530 7fa747690700  1 mds.0.server reconnect_clients -- 2 sessions
      2021-04-29T15:50:00.964+0530 7fa747690700  0 log_channel(cluster) log [DBG] : reconnect by client.6892 v1:10.0.4.96:0/1646469259 after 0.00499997
      2021-04-29T15:50:00.972+0530 7fa747690700  0 log_channel(cluster) log [DBG] : reconnect by client.6990 v1:10.0.4.115:0/2776266880 after 0.0129999
      2021-04-29T15:50:00.972+0530 7fa747690700  1 mds.0.505 reconnect_done
      2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.cephnode1 Updating MDS map to version 507 from mon.2
      2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.0.505 handle_mds_map i am now mds.0.505
      2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.0.505 handle_mds_map state change up:reconnect --> up:rejoin
      2021-04-29T15:50:02.005+0530 7fa747690700  1 mds.0.505 rejoin_start
      2021-04-29T15:50:02.008+0530 7fa747690700  1 mds.0.505 rejoin_joint_start
      2021-04-29T15:50:02.040+0530 7fa740e83700  1 mds.0.505 rejoin_done
      2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.cephnode1 Updating MDS map to version 508 from mon.2
      2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 handle_mds_map i am now mds.0.505
      2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 handle_mds_map state change up:rejoin --> up:clientreplay
      2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 recovery_done -- successful recovery!
      2021-04-29T15:50:03.050+0530 7fa747690700  1 mds.0.505 clientreplay_start
      2021-04-29T15:50:03.094+0530 7fa740e83700  1 mds.0.505 clientreplay_done
      2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.cephnode1 Updating MDS map to version 509 from mon.2
      2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.0.505 handle_mds_map i am now mds.0.505
      2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.0.505 handle_mds_map state change up:clientreplay --> up:active
      2021-04-29T15:50:04.081+0530 7fa747690700  1 mds.0.505 active_start
      2021-04-29T15:50:04.085+0530 7fa747690700  1 mds.0.505 cluster recovered.

  • hard reset/power-off of  Cephnode2 by keeping the I/O operation in progress:
    • In this case we see that the system logs at cephnode 1(on which new MDS will be activated) gets activated after 15+ seconds of power-off.
      • Time at which power-off was it : 2021-04-29-16-17-37
      • Time at which the logs started to show in cephnode 1 (refer logs) i.e log started nearly after 15 seconds of hardware reset:
        • 2021-04-29T16:17:51.983+0530 7f5ba3a38700  1 mds.cephnode1 Updating MDS map to version 518 from mon.0
          2021-04-29T16:17:51.984+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map i am now mds.0.518
          2021-04-29T16:17:51.984+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map state change up:boot --> up:replay
          2021-04-29T16:17:51.984+0530 7f5ba3a38700  1 mds.0.518 replay_start
          2021-04-29T16:17:51.984+0530 7f5ba3a38700  1 mds.0.518  recovery set is
          2021-04-29T16:17:51.984+0530 7f5ba3a38700  1 mds.0.518  waiting for osdmap 504 (which blacklists prior instance)
          2021-04-29T16:17:54.044+0530 7f5b9ca2a700  0 mds.0.cache creating system inode with ino:0x100
          2021-04-29T16:17:54.045+0530 7f5b9ca2a700  0 mds.0.cache creating system inode with ino:0x1
          2021-04-29T16:17:55.025+0530 7f5b9ba28700  1 mds.0.518 Finished replaying journal
          2021-04-29T16:17:55.025+0530 7f5b9ba28700  1 mds.0.518 making mds journal writeable
          2021-04-29T16:17:56.060+0530 7f5ba3a38700  1 mds.cephnode1 Updating MDS map to version 519 from mon.0
          2021-04-29T16:17:56.060+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map i am now mds.0.518
          2021-04-29T16:17:56.060+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map state change up:replay --> up:reconnect
          2021-04-29T16:17:56.060+0530 7f5ba3a38700  1 mds.0.518 reconnect_start
          2021-04-29T16:17:56.060+0530 7f5ba3a38700  1 mds.0.518 reopen_log
          2021-04-29T16:17:56.060+0530 7f5ba3a38700  1 mds.0.server reconnect_clients -- 2 sessions
          2021-04-29T16:17:56.068+0530 7f5ba3a38700  0 log_channel(cluster) log [DBG] : reconnect by client.6990 v1:10.0.4.115:0/2776266880 after 0.00799994
          2021-04-29T16:17:56.069+0530 7f5ba3a38700  0 log_channel(cluster) log [DBG] : reconnect by client.6892 v1:10.0.4.96:0/1646469259 after 0.00899994
          2021-04-29T16:17:56.069+0530 7f5ba3a38700  1 mds.0.518 reconnect_done
          2021-04-29T16:17:57.099+0530 7f5ba3a38700  1 mds.cephnode1 Updating MDS map to version 520 from mon.0
          2021-04-29T16:17:57.099+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map i am now mds.0.518
          2021-04-29T16:17:57.099+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map state change up:reconnect --> up:rejoin
          2021-04-29T16:17:57.099+0530 7f5ba3a38700  1 mds.0.518 rejoin_start
          2021-04-29T16:17:57.103+0530 7f5ba3a38700  1 mds.0.518 rejoin_joint_start
          2021-04-29T16:17:57.472+0530 7f5b9d22b700  1 mds.0.518 rejoin_done
          2021-04-29T16:17:58.138+0530 7f5ba3a38700  1 mds.cephnode1 Updating MDS map to version 521 from mon.0
          2021-04-29T16:17:58.138+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map i am now mds.0.518
          2021-04-29T16:17:58.138+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map state change up:rejoin --> up:clientreplay
          2021-04-29T16:17:58.138+0530 7f5ba3a38700  1 mds.0.518 recovery_done -- successful recovery!
          2021-04-29T16:17:58.138+0530 7f5ba3a38700  1 mds.0.518 clientreplay_start
          2021-04-29T16:17:58.157+0530 7f5b9d22b700  1 mds.0.518 clientreplay_done
          2021-04-29T16:17:59.178+0530 7f5ba3a38700  1 mds.cephnode1 Updating MDS map to version 522 from mon.0
          2021-04-29T16:17:59.178+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map i am now mds.0.518
          2021-04-29T16:17:59.178+0530 7f5ba3a38700  1 mds.0.518 handle_mds_map state change up:clientreplay --> up:active
          2021-04-29T16:17:59.178+0530 7f5ba3a38700  1 mds.0.518 active_start
          2021-04-29T16:17:59.181+0530 7f5ba3a38700  1 mds.0.518 cluster recovered.
In both the test cases above we saw some extra delay of around 15 seconds + 8-10 seconds. (total 21-25 seconds for failover in case of power-off/reboot), 

Query: Any specific config that may need to be tweaked/tried to reduce this time for MDS to know that it has to activate and start the standby MDS Node?)

  Scenario 2:  
  • Only stop MDS Daemon Service on Active Node
    • In this scenario when we only tried stopping systemctl service for the MDS Node on Active Node, we have very good reading of around 5-7 Seconds for failover.
    • Deployment Mode CEPH MDS Setup Test Case I/O Resume Duration
      (Seconds)
      Node affected
    • 2 Node MDS Setupwith max_mds=1 Active-Standby MDS with Active Node MDS Demon stop 5-7 cephnode 1

Please suggest/advice if we can try to configure to achieve minimal failover duration in the first two scenarios. 

Best Regards,
Lokendra




On Thu, Apr 29, 2021 at 1:47 AM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
I don't have anything of merit to add to this, but it would be an interesting addition to your testing to see if active+standby-replay makes any difference with test-case1.

I don't think it would be applicable to any of the other use-cases, as a standby-replay MDS is bound to a single rank, meaning its bound to a single active MDS, and can't function as a standby for active:active.

https://docs.ceph.com/en/latest/cephfs/standby/#configuring-standby-replay

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/ceph_file_system_guide_technology_preview/installing_and_configuring_ceph_metadata_servers_mds#mds-configuring-standby-daemons-standby-replay

Good luck and look forward to hearing feedback/more results.

Reed

On Apr 27, 2021, at 8:40 AM, Lokendra Rathour <lokendrarathour@xxxxxxxxx> wrote:

Hi Team,
We have setup two Node Ceph Cluster using Native Cephfs Driver with Details as:
  • 3 Node / 2 Node MDS Cluster
  • 3 Node Monitor Quorum
  • 2 Node OSD
  • 2 Nodes for Manager

 

Cephnode3 have only Mon and MDS (only for test case 4-7) rest two nodes i.e. cephnode1 and cephnode2 have (mgr,mds,mon,rgw)

 

We have tested following failover scenarios for Native Cephfs Driver by mounting for any one sub-volume on a VM or client with continuous I/O operations(Directory creation after every 1 Second):

<image.png>


In the table above we have few queries as:
  • Refer test case 2 and test case 7, both are similar test case with only difference in number of Ceph MDS with time for both the test cases is different. It should be zero. But time is coming as 17 seconds for testcase 7.
  • Is there any configurable parameter/any configuration which we need to make in the Ceph cluster to get the failover time reduced to few seconds?
In current default deployment we are getting something around 35-40 seconds.

 

 

 

Best Regards,

--
~ Lokendra
skype: lokendrarathour


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



--
~ Lokendra
skype: lokendrarathour


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux