High availability for Third party
applications
With oracle 10gR2 oracle decided to open and publish the API
of it’s clusterware. This permits to third parti application to be
registered
in the oracle cluster layer or to develop your own high-availability
(HA)
solution.
For the howto install step of the oracle cluster you can
refer to this document (place here the link).
The 10gR2 allow more than one application to coexist on the
same cluster, maybe sharing your RAC nodes.
Just before you ask: you can have the oracle cluster without
RAC. In fact you can decide not to install the RAC at all and to go for
the
clusterware only.
Before you decide to cluster your application with this
product, you need to be aware that the oracle cluster needs a shared
disk where
to place the voting disk and the cluster registry.
It is quite easy to have a SAN in an enterprise environment…
not so in small companies.
As described here
(http://www.oracle.com/technology/pub/articles/hunter_rac.html)
there are viable alternatives.
The test system I set up was simple: two linux (SUSE Linux
Enterprise Server 9) nodes connected to a SAN.
The cluster registry and voting disk are on raw devices.
The RDBMS binaries are not installed.
A practical example:
I decided to implement a simple webserver and to cluster it
retiring my old heartbeat + MON solution.
The two nodes have this network configuration:
| |
Node1 |
Node2 |
| Name |
breonldblc03 |
breonldblc04 |
| Public IP address |
192.168.23.191 |
192.168.23.192 |
| Virtual name |
breonldblv03 |
breonldblv04 |
| Virtual IP |
192.168.23.196 |
192.168.23.19 |
| Private name |
internal1 |
Internal2 |
| Private IP |
192.168.255.1 |
192.168.255.2 |
My /etc/hosts looks as follow:
127.0.0.1
localhost
# special IPv6 addresses
::1
localhost ipv6-localhost ipv6-loopback
fe00::0
ipv6-localnet
ff00::0
ipv6-mcastprefix
ff02::1
ipv6-allnodes
ff02::2
ipv6-allrouters
ff02::3
ipv6-allhosts
192.168.23.191
breonldblc03.ran breonldblc03
192.168.23.192
breonldblc04.ran breonldblc04
192.168.23.18
breonldblv02.ran breonldblv02
192.168.23.196
breonldblv03.ran breonldblv03
192.168.23.19
breonldblv04.ran breonldblv04
192.168.23.20
breonldblv05.ran breonldblv05
192.168.255.1
internal1.ras internal1
192.168.255.2
internal2.ras internal2
Where ran is the internal domain of my company.
You can see two other virtual IP breonldblv02 and
breonldblv05. They will be used by my applications.
I installed apache on both nodes. At this point I bind the
webserver to listen on a specific network card (eth1) using the virtual
address
breonldblv05.
In your /etc/httpd/httpd.conf insert:
#
# Use name-based virtual hosting.
#
NameVirtualHost 192.168.23.20:80
The basic step is to create scripts and configuration files
which will be used to register your application in the cluster.
Oracle provide you the command crs_profile to simplify the
process using templates.
Since apache is network based I’m going to create a resource
based on the listening virtual IP.
With the oracle user issue the command:
crs_profile -create apache_ip -t application -a \
/u01/app/oracle/product/10.2/crs_1/bin/usrvip -o \
oi=eth1,ov=192.168.23.20,on=255.255.255.0
This will create a apache_ip.cap file in $ORA_CRS_HOME/crs/public
containing the parameters used by the next phase: the registration.
Check the file exists:
oracle10g@breonldblc03:/u01/app/oracle/product/10.2/crs_1/crs/public>
ll apache_ip*
-rw-r--r-- 1
oracle10g dba 799 2005-07-28 17:47
apache_ip.cap
The content of the file describe the configuration of your
resource called apache_ip.
You can modify it at will before registering the resource
into the cluster.
cat apache_ip.cap
NAME=apache_ip
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10.2/crs_1/bin/usrvip
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=60
DESCRIPTION=apache_ip
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=60
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=eth1
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=255.255.255.0
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=192.168.23.20
The
ACTION_SCRIPT=/u01/app/oracle/product/10.2/crs_1/bin/usrvip specify
which
script to use for starting, stopping and checking your application (in
the
example the IP address).
The options: oi=eth1,ov=192.168.23.20,on=255.255.255.0
specify which ethernet card to use, the ip address and the netmask.
Alle the parameters of the configuration file will be parsed
by the crs_registry command giving you an error messages is a problem
is found.
After the changes you can register the application:
crs_register apache_ip
For any modification at the configuration file you are going
to issue the command:
crs_register apache -u -dir
/u01/app/oracle/product/10.2/crs_1/crs/public
to update immediately the new configuration in the cluster.
The example assume that the apache.cap files is in the
directory /u01/app/oracle/product/10.2/crs_1/crs/public.
Several other steps are required after the registration. As
root:
$ORA_CRS_HOME/bin/crs_setperm apache_ip -o root
$ORA_CRS_HOME/bin/crs_setperm apache_ip -u
user:oracle10g:r-x
These two directives change the ownership of the resource
(an IP should be managed by root) and the permission on who can execute
the
script (in my system oracle is the cluster owner).
Now as oracle user I replicate the changes on the other
nodes:
scp /u01/app/oracle/product/10.2/crs_1/crs/public/*
breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public/
Always as oracle I start the virtual ip:
crs_start apache_ip
Attempting to start `apache_ip` on member `breonldblc04`
Start of `apache_ip` on member `breonldblc04` succeeded.
On breonldblc04 I find the following ifconfig output:
eth1:2 Link
encap:Ethernet HWaddr 00:08:02:1A:5E:12
inet
addr:192.168.23.20
Bcast:192.168.23.255
Mask:255.255.255.0
UP
BROADCAST
RUNNING MULTICAST MTU:1500
Metric:1
The resource is started.
Now a second step to cluster the apache daemon.
As oracle:
crs_profile -create apache -t application -B
/usr/sbin/apachectl \
-d "Apache Server" -r apache_ip \
-p favored -h "breonldblc03 breonldblc04" \
-a apache.scr -o
ci=30,ft=3,fi=12,ra=5
The syntax is a little bit more complex.
It is creating two files in your in $ORA_CRS_HOME/crs/public:
the cap file and a apache.scr file containing the script used for the
application. This script is generated by a standard template and can be
modified even after the registration of the service.
The apache_ip resource is need by apache to run correctly so
a dependency has been specified.
The basic command for the apache administration is
/usr/sbin/apachectl and will be included in the apache.scr script.
The options ci=30,ft=3,fi=12,ra=5 indicate the timeouts are
retries used bu the cluster before switching the application to another
node
while the line
-p favored -h "breonldblc03 breonldblc04"
indicates the policy to apply for the application placement
on the nodes.
In the official documentation FAVORED sometimes is
incorrectly referred as PREFFERED.
If you use a policy different from balanced you need to
specify the list of nodes with the –h option.
Now modify the action script
/u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
Personally I made only these three modifications:
PROBE_PROCS="httpd"
START_APPCMD="/usr/sbin/apachectl start"
STOP_APPCMD="/usr/sbin/apachectl stop"
If you are satisfied by your cap file you can register the
resource:
crs_register apache
And as root:
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm apache -o
root
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm apache -u
user:oracle10g:r-x
Now I prefer to change the apache.scr permission by hand:
chmod a+x
/u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
adding the executable right to everyone. This solve me a
problem: the script is run by a user different by oracle and I prefer
not to
change the ownership of the file to root.
It can be a security risk. Personally I handle the security
on APPCMD but it can be questionable.
Copy the scripts and cap files on other nodes:
scp /u01/app/oracle/product/10.2/crs_1/crs/public/*
breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public/
and make sure the permission are correct everywhere (that’s
really important or your application won’t be able to start).
ls -l
/u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
-rwxr-xr-x 1
oracle10g dba 13228 2005-07-28 18:01
/u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
Now, as oracle, you ca start your apache:
crs_start apache
Attempting to start `apache` on member `breonldblc04`
Start of `apache` on member `breonldblc04` succeeded.
You can switch the resource on the other node:
crs_relocate apache -f -c breonldblc03
Attempting to stop `apache` on member `breonldblc04`
Stop of `apache` on member `breonldblc04` succeeded.
Attempting to stop `apache_ip` on member `breonldblc04`
Stop of `apache_ip` on member `breonldblc04` succeeded.
Attempting to start `apache_ip` on member `breonldblc03`
Start of `apache_ip` on member `breonldblc03` succeeded.
Attempting to start `apache` on member `breonldblc03`
Start of `apache` on member `breonldblc03` succeeded.
The –f is needed since there are dependencies (apache_ip)
while the –c is optional since I have only two nodes.
Using the second node:
Now, since I have a spare node, I decided to use it to
provide another service: a nfs.
After installing the nfs tools on both nodes I decided to
use the virtual name breonldblv05 for my new resource.
Two solution are available:
- to register a single cumulative resource containing the
command for the mount point and for the nfs daemon;
- or to create two different resources, the mount point and
the nfs daemon, with the latter dependant by the former.
Since I have only a mount point I went for the first and
simpler solution.
If you have more complex and flexible needs you can go for
the second solution.
I perform the previous steps for the virtual IP:
crs_profile -create nfs_ip -t application -a \
/u01/app/oracle/product/10.2/crs_1/bin/usrvip -o \
oi=eth1,ov=192.168.23.18,on=255.255.255.0
crs_register nfs_ip
scp /u01/app/oracle/product/10.2/crs_1/crs/public/*
oracle10g@breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public
As root:
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm nfs_ip -o
root
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm nfs_ip -u
user:oracle10g:r-x
As oracle:
crs_start nfs_ip
On the breonldblc04 you should have the nfs virtual IP
address.
Now a little nfs daemon setup.
In the /etc/exports of both nodes place:
/pub
*(ro,insecure,all_squash,async)
It export your /pub mount point in read only mode permitting
to all anonymous user to read the data inside (in asynchronous mode).
While in the /etc/fstab:
/dev/oradata8_r/nfslv
/pub
ext3 noauto,ro
1 2
This line help me with the cluster starting and stopping
script; /pub is not mounted automatically at boot time.
The /dev/oradata8_r/nfslv is the device to be mounted and
should be shared between the two nodes.
You can even try a solution where the file system is local
at the node and is kept synchronized by a home made solution. In this
scenario
you won’t have to mount and umount /pub during a failover.
Since the oracle clusterware needs a shared device I prefer a
shared device for my nfs (in my solution I’m even using a LVM).
Before going on make sure you have a file system and a mount
point as described in your fstab.
The command:
crs_profile -create pubnfs -t application -B /etc/init.d/nfsserver
\
-d "Public NFS"
-r nfs_ip \
-a pubnfs.scr -p favored -h "breonldblc03
breonldblc04" \
-o ci=30,ft=3,fi=12,ra=5
will create the script and configuration file for pubfs
resource.
In my example I decided to be lazy and used the nfs init script
since it is already there, ready for me.
My modification to /u01/app/oracle/product/10.2/crs_1/crs/public/
pubnfs.scr:
PROBE_PROCS="nfsd"
START_APPCMD="/bin/mount /pub"
START_APPCMD2="/etc/init.d/nfsserver start"
STOP_APPCMD="/etc/init.d/nfsserver stop"
STOP_APPCMD2="/bin/umount /pub"
Since to mount a normal filesystem from two nodes can lead
to a data corruption you can integrate a special check in your
pubnfs.scr.
Otherwise you can register your mount point as a cluster resource.
In this case you need to modify your scripts a little bit
more, adding checks not only for the process daemon but even for the
mount
point.
This could be a starting point:
checkmount () {
R=`mount|grep
"on $1 type"|wc -l`
return $R
}
The procedure checkmount could be used in probeapp.
Let’s be back at our previous example and at the usual
steps:
crs_register pubnfs
As root:
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm pubnfs -o
root
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm pubnfs -u
user:oracle10g:r-x
as oracle:
chmod a+x
/u01/app/oracle/product/10.2/crs_1/crs/public/
pubnfs.scr
scp /u01/app/oracle/product/10.2/crs_1/crs/public/*
oracle10g@breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public
crs_start pubnfs
Check your whole system:
crs_stat -v -t
Name
Type
R/RA F/FT
Target State Host
----------------------------------------------------------------------
apache
application 0/5 0/3
ONLINE ONLINE breo...lc04
apache_ip application 0/1
0/0 ONLINE
ONLINE
breo...lc04
nfs_ip
application 0/1 0/0
ONLINE ONLINE breo...lc03
ora....c03.gsd application
0/5 0/0
ONLINE
ONLINE breo...lc03
ora....c03.ons application
0/3 0/0
ONLINE
ONLINE breo...lc03
ora....c03.vip application
0/0 0/0
ONLINE
ONLINE breo...lc03
ora....c04.gsd application
0/5 0/0
ONLINE
ONLINE breo...lc04
ora....c04.ons application
0/3 0/0
ONLINE
ONLINE breo...lc04
ora....c04.vip application
0/0 0/0
ONLINE
ONLINE breo...lc04
pubnfs
application 0/5 0/3
ONLINE ONLINE breo...lc03
As showed by the above output my system is exporting the
webserver service on the breonldblc04 and nfs on breonldblc03.
Make sure to point at your service using the right virtual
ip.
The resources starting with ora. are reserved to the oracle
clusterware and shouldn’t be managed directly without the oracle
support.
Failover:
Now you can start testing the failover of your system.
oracle@breonldblc03:~> crs_stat apache
NAME=apache
TYPE=application
TARGET=ONLINE
STATE=ONLINE on breonldblc04
ps -fe|grep httpd
root 29459
1 0
10:44 ?
00:00:00
/usr/sbin/httpd
wwwrun 29463
29459 0 10:44
? 00:00:00
/usr/sbin/httpd
wwwrun 29665
29459 0 10:44
? 00:00:00
/usr/sbin/httpd
root 8144
24437 0 15:16
pts/0 00:00:00
grep httpd
kill the daemon:
kill -9 29459 29463 29665
The cluster should restart the httpd after several seconds.
ps -fe|grep httpd
root 9560 1 0
15:16 ?
00:00:00
/usr/sbin/httpd
wwwrun 9566
9560 0
15:16 ?
00:00:00
/usr/sbin/httpd
root 11475
24437 0 15:18
pts/0 00:00:00
grep httpd
Apache resource was set to switch to the other nodes after
five restarts.
Kill the processes five times and I apache is going to
migrate:
oracle@breonldblc03:~> crs_stat apache
NAME=apache
TYPE=application
TARGET=ONLINE
STATE=ONLINE on breonldblc03
For normal administration the three main command are
crs_start, crs_stop, crs_relocate.
If you have a starting issue and your application is in
UNKNOWN state than you can clear it by using:
crs_stop apache -f
The command:
crs_stop –all
is useful if you wish to stop the whole cluster.
crs_stat -t -v -v
Name
Type
R/RA F/FT
Target State Host
----------------------------------------------------------------------
apache
application 0/2 0/3
OFFLINE OFFLINE
apache_ip
application 0/1 0/0
OFFLINE OFFLINE
nfs_ip
application 0/1 0/0
OFFLINE OFFLINE
ora....c03.gsd application
0/5 0/0
OFFLINE
OFFLINE
ora....c03.ons application
0/3 0/0
OFFLINE
OFFLINE
ora....c03.vip application
0/0 0/0
OFFLINE OFFLINE
ora....c04.gsd application
0/5 0/0
OFFLINE
OFFLINE
ora....c04.ons application
0/3 0/0
OFFLINE
OFFLINE
ora....c04.vip application
0/0 0/0
OFFLINE
OFFLINE
pubnfs
application 0/5 0/3
OFFLINE OFFLINE
The same option exists for crs_start.
Conclusions:
Using the techniques described in this paper you secure your
own application with an HA solution. Adding nodes and applications is
not an
issue at all and can be described in further papers.
Contact information:
fabrizio.magni _at_ gmail.com