Wednesday, 11 December 2013

Red Hat Cluster Suite fence agent for HP C7000 enclosure

Some of the projects I've worked on used the C7000 blade enclosure from HP.

It's a neat solution to implement clustering since it has almost all of the required hardware in one package.

Our software also uses Red Hat Enterprise Linux 5 (RHEL5)  and Red Hat Cluster Suite.
The system seemed to work properly but there were some issues with the fencing.

We were using the iLO of the Blades servers for fencing nodes. Everything works fine until you pull out one node... Since the iLO port is no longer active and neither is the server the cluster stops and goes into a loop trying to fence the node. From this point on nothing works until you do a fence_ack_manual...
A similar situation as also occurred when one of the nodes suddenly died, by this I mean it had a complete hardware failure and it stopped responding. This hardware fault also affected the iLO port and so the cluster went into a fencing failed loop...

In order to solve this problem a new fencing level must be added that uses the C7000.
When fencing through the iLO port fails the cluster must try to do a fence through the enclosure management (OnBoard Administrator).
To make this work a new fence agent was added.

I modified the fence script for a IBM Bladecenter (/sbin/fence_bladecenter) so it works with the C7000 enclosure, the configuration in the cluster.conf is the same as with the IBM Bladecenter except that for the agent property:

<fencedevice agent="fence_c7000" ipaddr="" login="Administrator" name="HP_C7000_OA_P" passwd="passwd"/>

the fence level configuration can be set as:

<clusternode name="node1" nodeid="1" votes="1">
<method name="1">
<device name="node1_ilo" action="off"/>
<device name="node1_ilo" action="on"/>
<method name="2">
<device blade="1" name="HP_C7000_OA_P"/>

This enables the cluster to fence through the iLo port first and then try the Onboard Administrator, in this case it would power cycle the blade in slot 1 (blade="1").

You can get the fence script here: fence_c7000

To get this to work you need to place the script in the /sbin directory and give it execute permissions:

chmod 0755 /sbin/fence_c7000

Then it's just a matter of configuring the cluster.conf with the examples above.

Hope this helps!

No comments:

Post a Comment