Tuesday 26 August 2014

RHEV and RHEL Clustering - Fencing without RHEVM - The script



Following up on the blog post from a few weeks ago (here) I finally got around to creating a fence script that allows for fencing a VM without an available RHEV manager.


I've placed the script in my GitHub HERE, you need to copy it to /usr/sbin and give it execution permission.


How does it work?

I replicated the fence_virtsh script and changed the code to add the necessary commands.
The list of hosts where the script checks for the VM is passed in the "ipaddr" field.

Since this is a custom fencing script you cannot configure it directly in Luci, you need to edit the cluster.conf file manually.

The fence device can be added to a node like this:

<clusternode name="server1" nodeid="1">
    <fence>
        <method name="RHEV-NOMGT">
            <device name="rhev-nomgt" port="Linux-Serv1"/>
        </method>
    </fence>
</clusternode>

The fence device itself is define like this:

<fencedevice agent="fence_rhev_nomgt" ipaddr="192.168.1.1,192.168.1.2" login="root" name="rhev-nomgt" passwd="password"/>

In the clusternode fence method definition, the "port" is the name of VM in the RHEV system.

The "ipaddr" parameter, in the fence device, is a list of the hostnames (or ip addresses) of the hypervisors where the VM can run separated by a comma. The login and password refer to the login and password of the root user on the hypervisors. I know, this is not very safe but the hypervisors don't allow the creation of other users and for my scenario this won't be an issue.

Example cluster.conf for a two-node cluster:


<?xml version="1.0"?>
<cluster config_version="1" name="RHEVCLUS">
        <clusternodes>
                <clusternode name="server1" nodeid="1">
                        <fence>
                                <method name="RHEV">
                                        <device name="rhev-nomgt" port="Linux-Serv1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="server2" nodeid="2">
                        <fence>
                                <method name="RHEV">
                                        <device name="rhev-nomgt" port="Linux-Serv2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_rhev_nomgt" ipaddr="172.18.56.251,172.18.56.252" login="root" name="rhev-nomgt" passwd="password"/>
              
        </fencedevices>
</cluster>


PLEASE TAKE NOTE:
This fencing method is intended as a fail-safe option when there is no other possible fencing option available and to ensure the cluster doesn't halt in specific situations where there RHEV Manager isn't available. This should no be used as a primary fencing method! And please note that it can cause issues in databases such as loss of data and file corruption. To use this method you should be fully aware of the risks.


Monday 11 August 2014

RHEV and RHEL Clustering - Fencing without RHEVM



UPDATE: I've added the script see the post: here.


For an upcoming project I'll be using a Red Hat Cluster inside a RHEV environment.
At first glance I didn't see any problems since RHEL's High Availability add-on already includes a fencing script for the RHEV-M.
But what happens when the RHEV-M is down or unresponsive and the cluster need to fence one of the nodes?

This could mean trouble since the cluster would stop every service it manages resulting in a potential downtime for our applications.

After some research I've come up with a possible solution that allows for the fencing of a VM without a RHEV-M.

The process is quite simple but needs a few steps:

1 - Get a list of all the hypervisors inside your RHEV system where the VM can run

2 - For each of these hosts do the following operations until we find the VM:

  • Connect to the host as root
  • Check if there is a QEMU process for our VM in our current host. If there is proceed with the following commands, if not then try the next hypervisor.
  • Create a new set of credentials to interact with libvirt: 
  • saslpasswd2 -p -a libvirt fenceagent

    (fenceagent is a username and this command will ask for a password)

  • Restart the VM with the following command:
  • virsh  qemu-monitor-command --hmp VM_NAME system_reset

    (Change VM_NAME for the name of the VM as it appears on RHEV)
  • Remove the user you created.
  • Log off from the hypervisor

In a few days I'll transform this into a Python script so I can add it to the Cluster. 
I've already validated this process manually so I think there will be no major issues with it.

But there is a potential issue, since this requires an iteration over all the hypervisors (or at least until you find the VM) it can that a lot of time if there are lots of hypervisors, but at least your cluster won't go berserk :D

This will also need some extra configuration like a list of Hypervisors where the VM can be run and the VM name also needs to be passed as an argument to the fence script


For future reference, I based this "algorithm" in the following information:



More updates on this to follow.