As requested by digimer in linux-ha irc channel here there is fence_ovh. It's not a priority that it's included by default in official distribution of cluster software but if you guide me on how to polish it I think I can improve it a lot more and make tests in real machines (as long as my machines are still test machines and not production ones). 1) What is fence_ovh fence_ovh is a fence agent based on python for the big French datacentre provider OVH. You can get information about OVH on: http://www.ovh.co.uk/ . I also wanted to make clear that I'm not part of official OVH staff. 2) Features The script has two main functions: * Reboot into rescue mode (action=off) * Reboot into the hard disk (action=on;action=reboot) 3) Technical details So as you might deduce the classical fence mechanism which turns off the other node is not actually done by turning off the machine but by rebooting it into a rescue mode. Another particular thing to mention is that the script checks if the machine has rebooted ok into rescue mode thanks to an OVH API which reports the date when the server rebooted. By the way the OVH API is also used in the main function that consists in rebooting the machine into rescue mode. 4) How to use it 4.1) Make sure python-soappy package is installed (Debian/Ubuntu). 4.2) Save fence_ovh in /usr/sbin 4.3) Run: ccs_update_schema so that new metadata is put into cluster.rng 4.4) If needed validate your configuration: ccs_config_validate -v -f /etc/pve/cluster.conf.new 4.5) Here's an example of how to use it in cluster.conf: <?xml version="1.0"?> <cluster name="ha-008-010" config_version="3"> <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" two_node="1" expected_votes="1"> </cman> <fencedevices> <fencedevice agent="fence_ovh" name="fence008" email="admin@xxxxxxxxxx" ipaddr="ns123456" login="ab12345-ovh" passwd="MYSECRET" /> <fencedevice agent="fence_ovh" name="fence010" email="admin@xxxxxxxxxx" ipaddr="ns789012" login="ab12345-ovh" passwd="MYSECRET" /> </fencedevices> <clusternodes> <clusternode name="server008" nodeid="1" votes="1"> <fence> <method name="1"> <device name="fence008" action="off"/> </method> </fence> </clusternode> <clusternode name="server010" nodeid="2" votes="1"> <fence> <method name="1"> <device name="fence010" action="off"/> </method> </fence> </clusternode> </clusternodes> </cluster> Finally I attach to this email the first version of ovh_fence script. It can be improved a lot, I've just realised that I've left some mention an .ini file in the metadata that I had used previously to feed user / pass while now they are gathered from cluster.conf configuration directly as any fence agent. The original thread from Proxmox forum from which I adapted original secofor script: http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152 P.S.: It was not easy to develop a fence agent because there's no documentation on it. I maybe arise another email in this same mailing list about this subject. -- -- Adrián Gibanel I.T. Manager +34 675 683 301 www.btactic.com Ens podeu seguir a/Nos podeis seguir en: i Abans d´imprimir aquest missatge, pensa en el medi ambient. El medi ambient és cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El medio ambiente es cosa de todos. AVIS: El contingut d'aquest missatge i els seus annexos és confidencial. Si no en sou el destinatari, us fem saber que està prohibit utilitzar-lo, divulgar-lo i/o copiar-lo sense tenir l'autorització corresponent. Si heu rebut aquest missatge per error, us agrairem que ho feu saber immediatament al remitent i que procediu a destruir el missatge . AVISO: El contenido de este mensaje y de sus anexos es confidencial. Si no es el destinatario, les hacemos saber que está prohibido utilizarlo, divulgarlo y/o copiarlo sin tener la autorización correspondiente. Si han recibido este mensaje por error, les agradeceríamos que lo hagan saber inmediatamente al remitente y que procedan a destruir el mensaje .
#!/usr/bin/python # Copyright 2013 Adrian Gibanel Lopez (bTactic) # Adrian Gibanel improved this script # at 2013 to add verification of success # and to output metadata # Based on: # This is a fence agent for use at OVH # As there are no other fence devices available, # we must use OVH's SOAP API #Quick-and-dirty # assemled by Dennis Busch, secofor GmbH, # Germany # This work is licensed under a # Creative Commons Attribution-ShareAlike 3.0 Unported License. # Manual call parametres example # # login=ab12345-ovh # passwd=MYSECRET # email=admin@myadmin # ipaddr=ns12345 # action=off # # where ipaddr is your server's OVH name import sys, re, pexpect sys.path.append("/usr/share/fence") from fencing import * import sys from SOAPpy import WSDL import time from datetime import datetime OVH_RESCUE_PRO_NETBOOT_ID='28' OVH_HARD_DISK_NETBOOT_ID='1' STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run OVH_FENCE_DEBUG=False # True or False for debug def netboot_reboot(nodeovh,login,passwd,email,mode): soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl') session = soap.login(login, passwd, 'es', 0) #dedicatedNetbootModifyById changes the mode of the next reboot result = soap.dedicatedNetbootModifyById(session, nodeovh, mode, '', email) #dedicatedHardRebootDo initiates a hard reboot on the given node soap.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es') soap.logout(session) def reboot_status(nodeovh,login,passwd): soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl') session = soap.login(login, passwd, 'es', 0) result = soap.dedicatedHardRebootStatus(session, nodeovh) tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S') tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S') result.start = tmpstart result.end = tmpend soap.logout(session) return result #print stderr to file save_stderr = sys.stderr errlog = open("/var/log/fence_ovh_error.log","a") sys.stderr = errlog global all_opt device_opt = [ "email", "ipaddr", "action" , "login" , "passwd"] ovh_fence_opt = { "email" : { "getopt" : "Z:", "longopt" : "email", "help" : "-Z, --email=<email> email for reboot message: admin@xxxxxxxxxx", "required" : "1", "shortdesc" : "Reboot email", "default" : "", "order" : 1 }, } all_opt.update(ovh_fence_opt) all_opt["ipaddr"]["shortdesc"] = "OVH node name" atexit.register(atexit_handler) options=check_input(device_opt,process_input(device_opt)) # Not sure if I need this old notation ## Support for -n [switch]:[plug] notation that was used before if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))): (switch, plug) = options["-n"].split(":", 1) if ((switch.isdigit()) and (plug.isdigit())): options["-s"] = switch options["-n"] = plug if (not (options.has_key("-s"))): options["-s"]="1" docs = { } docs["shortdesc"] = "Fence agent for OVH" docs["longdesc"] = "fence_ovh is an Power Fencing agent \ which can be used within OVH datecentre. \ Poweroff is simulated with a reboot into rescue-pro \ mode. \ /usr/local/etc/ovhsecret example: \ \ [OVH] \ Login = ab12345-ovh \ Passwd = MYSECRET \ " docs["vendorurl"] = "http://www.ovh.net" show_docs(options, docs) #I use a own logfile for debugging purpose if OVH_FENCE_DEBUG: logfile=open("/var/log/fence_ovh.log", "a"); logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n")) logfile.write("Parameter:\t") for val in sys.argv: logfile.write(val + " ") logfile.write("\n") action=options['-o'] email=options['-Z'] login=options['-l'] passwd=options['-p'] nodeovh=options['-a'] if nodeovh[-8:] != '.ovh.net': nodeovh += '.ovh.net' # Save datetime just before changing netboot before_netboot_reboot = datetime.now() if action == 'off': netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro elif action == 'on': netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD elif action == 'reboot': netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD else: if OVH_FENCE_DEBUG: logfile.write("nothing to do\n") logfile.close() errlog.close() sys.exit() if action == 'off': time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM elif action == 'on': time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD elif action == 'reboot': time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD else: if OVH_FENCE_DEBUG: logfile.write("No sense! Check script please!\n") logfile.close() errlog.close() sys.exit() after_netboot_reboot = datetime.now() # Verification of success reboot_start_end=reboot_status(nodeovh,login,passwd) if OVH_FENCE_DEBUG: logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n") logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n") logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n") logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n") if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)): if OVH_FENCE_DEBUG: logfile.write("Netboot reboot went OK.\n") else: if OVH_FENCE_DEBUG: logfile.write("ERROR: Netboot reboot wasn't OK.\n") logfile.close() errlog.close() sys.exit(1) if OVH_FENCE_DEBUG: logfile.close() errlog.close()
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster