fence_ovh - Fence agent for OVH

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  As requested by digimer in linux-ha irc channel here there is fence_ovh. It's not a priority that it's included by default in official distribution of cluster software but if you guide me on how to polish it I think I can improve it a lot more and make tests in real machines (as long as my machines are still test machines and not production ones). 

1) What is fence_ovh 

fence_ovh is a fence agent based on python for the big French datacentre provider OVH. You can get information about OVH on: http://www.ovh.co.uk/ . I also wanted to make clear that I'm not part of official OVH staff. 

2) Features 
The script has two main functions: 

* Reboot into rescue mode (action=off) 
* Reboot into the hard disk (action=on;action=reboot) 

3) Technical details 
So as you might deduce the classical fence mechanism which turns off the other node is not actually done by turning off the machine but by rebooting it into a rescue mode. 

Another particular thing to mention is that the script checks if the machine has rebooted ok into rescue mode thanks to an OVH API which reports the date when the server rebooted. By the way the OVH API is also used in the main function that consists in rebooting the machine into rescue mode. 

4) How to use it 

4.1) Make sure python-soappy package is installed (Debian/Ubuntu).
4.2) Save fence_ovh in /usr/sbin 
4.3) Run: ccs_update_schema so that new metadata is put into cluster.rng 
4.4) If needed validate your configuration: 
ccs_config_validate -v -f /etc/pve/cluster.conf.new 
4.5) Here's an example of how to use it in cluster.conf:

<?xml version="1.0"?>
<cluster name="ha-008-010" config_version="3">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" two_node="1" expected_votes="1">
</cman>

<fencedevices>
        <fencedevice agent="fence_ovh" name="fence008" email="admin@xxxxxxxxxx" ipaddr="ns123456" login="ab12345-ovh" passwd="MYSECRET" />
        <fencedevice agent="fence_ovh" name="fence010" email="admin@xxxxxxxxxx" ipaddr="ns789012" login="ab12345-ovh" passwd="MYSECRET" />
</fencedevices>

<clusternodes>
<clusternode name="server008" nodeid="1" votes="1">
  <fence>
    <method name="1">
      <device name="fence008" action="off"/>
    </method>
  </fence>
</clusternode>
<clusternode name="server010" nodeid="2" votes="1">
  <fence>
    <method name="1">
      <device name="fence010" action="off"/>
    </method>
  </fence>
</clusternode>
</clusternodes>


</cluster>



Finally I attach to this email the first version of ovh_fence script. It can be improved a lot, I've just realised that I've left some mention an .ini file in the metadata that I had used previously to feed user / pass while now they are gathered from cluster.conf configuration directly as any fence agent.

The original thread from Proxmox forum from which I adapted original secofor script: http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152

P.S.: It was not easy to develop a fence agent because there's no documentation on it. I maybe arise another email in this same mailing list about this subject. 

-- 

-- 
Adrián Gibanel 
I.T. Manager 

+34 675 683 301 
www.btactic.com 



Ens podeu seguir a/Nos podeis seguir en: 

i 


Abans d´imprimir aquest missatge, pensa en el medi ambient. El medi ambient és cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El medio ambiente es cosa de todos. 

AVIS: 
El contingut d'aquest missatge i els seus annexos és confidencial. Si no en sou el destinatari, us fem saber que està prohibit utilitzar-lo, divulgar-lo i/o copiar-lo sense tenir l'autorització corresponent. Si heu rebut aquest missatge per error, us agrairem que ho feu saber immediatament al remitent i que procediu a destruir el missatge . 

AVISO: 
El contenido de este mensaje y de sus anexos es confidencial. Si no es el destinatario, les hacemos saber que está prohibido utilizarlo, divulgarlo y/o copiarlo sin tener la autorización correspondiente. Si han recibido este mensaje por error, les agradeceríamos que lo hagan saber inmediatamente al remitente y que procedan a destruir el mensaje . 
#!/usr/bin/python
# Copyright 2013 Adrian Gibanel Lopez (bTactic)
# Adrian Gibanel improved this script
# at 2013 to add verification of success
# and to output metadata

# Based on:
# This is a fence agent for use at OVH
# As there are no other fence devices available,
# we must use OVH's SOAP API #Quick-and-dirty
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.

# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off

# # where ipaddr is your server's OVH name

import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *

import sys
from SOAPpy import WSDL
import time
from datetime import datetime

OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug

def netboot_reboot(nodeovh,login,passwd,email,mode):
    soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl')
    session = soap.login(login, passwd, 'es', 0)
 
    #dedicatedNetbootModifyById changes the mode of the next reboot
    result = soap.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
 
    #dedicatedHardRebootDo initiates a hard reboot on the given node
    soap.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
 
    soap.logout(session)

def reboot_status(nodeovh,login,passwd):
    soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl')
    session = soap.login(login, passwd, 'es', 0)
 
    result = soap.dedicatedHardRebootStatus(session, nodeovh)
    tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
    tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
    result.start = tmpstart
    result.end = tmpend

    soap.logout(session)
    return result

#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog

global all_opt

device_opt = [  "email", "ipaddr", "action" , "login" , "passwd"]

ovh_fence_opt = {
        "email" : {
                "getopt" : "Z:",
                "longopt" : "email",
                "help" : "-Z, --email=<email>          email for reboot message: admin@xxxxxxxxxx",
                "required" : "1",
                "shortdesc" : "Reboot email",
                "default" : "",
                "order" : 1 },
}

all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"

atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
	(switch, plug) = options["-n"].split(":", 1)
	if ((switch.isdigit()) and (plug.isdigit())):
		options["-s"] = switch
		options["-n"] = plug

if (not (options.has_key("-s"))):
	options["-s"]="1"

docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
 /usr/local/etc/ovhsecret example: \
 \
 [OVH] \
 Login = ab12345-ovh \
 Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net";
show_docs(options, docs)


#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
    logfile=open("/var/log/fence_ovh.log", "a");
    logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
    logfile.write("Parameter:\t")
    for val in sys.argv:
	logfile.write(val + " ")
	logfile.write("\n")

action=options['-o']
email=options['-Z']
login=options['-l']
passwd=options['-p']
nodeovh=options['-a']
if nodeovh[-8:] != '.ovh.net':
    nodeovh += '.ovh.net'
    
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()

if action == 'off':
    netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
    netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
	logfile.write("nothing to do\n")
	logfile.close()
    errlog.close()
    sys.exit()

if action == 'off':
    time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
    time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
    if OVH_FENCE_DEBUG:
	logfile.write("No sense! Check script please!\n")
	logfile.close()
    errlog.close()
    
    sys.exit()

after_netboot_reboot = datetime.now()

# Verification of success

reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
    logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
    logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")

if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
    if OVH_FENCE_DEBUG:
	logfile.write("Netboot reboot went OK.\n")
else:
    if OVH_FENCE_DEBUG:
	logfile.write("ERROR: Netboot reboot wasn't OK.\n")
	logfile.close()
    errlog.close()
    sys.exit(1)


if OVH_FENCE_DEBUG:
    logfile.close()
errlog.close()
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux