jONGPil's Play Notes. Blah ~ Blah ~: ▷ HA(High availability) configuration using corosync/pacemaker/pcs (리눅스에서 HA 구성)

[ corosync/pacemaker/pcs ]

1) Packages
- corosync: messaging component for communication and membership like heartbeat (required)
- pacemaker: resource manager for stop/start resources (required)
- pcs: cluster manager for cluster settings (optional)

2) Nodes
- Two CentOS 7.x
node01 = Master
node02 = Slave

1. Environment
1) Hostname change (optional)
(node01)
# vi /etc/sysconfig/network 에서 HOSTNAME=node01
(node02)
# vi /etc/sysconfig/network 에서 HOSTNAME=node02

2) /etc/hosts file change
(node01, node02)
# vi /etc/hosts
192.168.10.10 node01
192.168.10.20 node02

3) hostname check
(node01, node02)
# uname -n or hostname

4) time sync
(node01, node02)
# rdate -s time.bora.net

5) firewall rules
iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT
iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT
iptables -I INPUT -p igmp -j ACCEPT
iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT
service iptables save

- Open UDP 5404 & 5405 for Corosync:
- Open TCP 2224 for PCS:
- Allow IGMP traffic:
- Allow multicast-traffic:

2. corosync/pacemaker/pcs install
1) download
yum -y install corosync pacemaker pcs --downloadonly --downloaddir=./
=====================================================================================================
Package Arch Version Repository Size
=====================================================================================================
Installing:
corosync x86_64 1.4.7-5.el6 base 216 k
pacemaker x86_64 1.1.15-5.el6 base 443 k
pcs x86_64 0.9.155-2.el6.centos base 4.5 M
Installing for dependencies:
ccs x86_64 0.16.2-87.el6 base 57 k
clusterlib x86_64 3.0.12.1-84.el6 base 109 k
cman x86_64 3.0.12.1-84.el6 base 454 k
compat-readline5 x86_64 5.2-17.1.el6 base 130 k
corosynclib x86_64 1.4.7-5.el6 base 194 k
cyrus-sasl-md5 x86_64 2.1.23-15.el6_6.2 base 47 k
fence-agents x86_64 4.0.15-13.el6 base 193 k
fence-virt x86_64 0.2.3-24.el6 base 39 k
gnutls-utils x86_64 2.12.23-21.el6 base 109 k
ipmitool x86_64 1.8.15-2.el6 base 465 k
libibverbs x86_64 1.1.8-4.el6 base 53 k
libqb x86_64 0.17.1-2.el6 base 71 k
librdmacm x86_64 1.0.21-0.el6 base 60 k
libvirt-client x86_64 0.10.2-62.el6 base 4.1 M
lm_sensors-libs x86_64 3.1.1-17.el6 base 38 k
modcluster x86_64 0.16.2-35.el6 base 210 k
nc x86_64 1.84-24.el6 base 57 k
net-snmp-libs x86_64 1:5.5-60.el6 base 1.5 M
net-snmp-utils x86_64 1:5.5-60.el6 base 177 k
openais x86_64 1.1.1-7.el6 base 192 k
openaislib x86_64 1.1.1-7.el6 base 82 k
pacemaker-cli x86_64 1.1.15-5.el6 base 291 k
pacemaker-cluster-libs x86_64 1.1.15-5.el6 base 85 k
pacemaker-libs x86_64 1.1.15-5.el6 base 483 k
perl-Net-Telnet noarch 3.03-11.el6 base 56 k
pexpect noarch 2.3-6.el6 base 147 k
python-clufter x86_64 0.59.8-1.el6 base 378 k
python-suds noarch 0.4.1-3.el6 base 218 k
ricci x86_64 0.16.2-87.el6 base 633 k
ruby x86_64 1.8.7.374-5.el6 base 538 k
ruby-irb x86_64 1.8.7.374-5.el6 base 318 k
ruby-libs x86_64 1.8.7.374-5.el6 base 1.7 M
ruby-rdoc x86_64 1.8.7.374-5.el6 base 381 k
rubygems noarch 1.3.7-5.el6 base 207 k
sg3_utils x86_64 1.28-12.el6 base 498 k
telnet x86_64 1:0.17-48.el6 base 58 k
yajl x86_64 1.0.7-3.el6 base 27 k
Updating for dependencies:
gnutls x86_64 2.12.23-21.el6 base 389 k

Transaction Summary
==================================================================================================
Install 40 Package(s)
Upgrade 1 Package(s)

Total download size: 20 M
Downloading Packages:
(1/41): ccs-0.16.2-87.el6.x86_64.rpm | 57 kB 00:00
(2/41): clusterlib-3.0.12.1-84.el6.x86_64.rpm | 109 kB 00:00
(3/41): cman-3.0.12.1-84.el6.x86_64.rpm | 454 kB 00:00
(4/41): compat-readline5-5.2-17.1.el6.x86_64.rpm | 130 kB 00:00
(5/41): corosync-1.4.7-5.el6.x86_64.rpm | 216 kB 00:00
(6/41): corosynclib-1.4.7-5.el6.x86_64.rpm | 194 kB 00:00
(7/41): cyrus-sasl-md5-2.1.23-15.el6_6.2.x86_64.rpm | 47 kB 00:00
(8/41): fence-agents-4.0.15-13.el6.x86_64.rpm | 193 kB 00:00
(9/41): fence-virt-0.2.3-24.el6.x86_64.rpm | 39 kB 00:00
(10/41): gnutls-2.12.23-21.el6.x86_64.rpm | 389 kB 00:00
(11/41): gnutls-utils-2.12.23-21.el6.x86_64.rpm | 109 kB 00:00
(12/41): ipmitool-1.8.15-2.el6.x86_64.rpm | 465 kB 00:00
(13/41): libibverbs-1.1.8-4.el6.x86_64.rpm | 53 kB 00:00
(14/41): libqb-0.17.1-2.el6.x86_64.rpm | 71 kB 00:00
(15/41): librdmacm-1.0.21-0.el6.x86_64.rpm | 60 kB 00:00
(16/41): libvirt-client-0.10.2-62.el6.x86_64.rpm | 4.1 MB 00:00
(17/41): lm_sensors-libs-3.1.1-17.el6.x86_64.rpm | 38 kB 00:00
(18/41): modcluster-0.16.2-35.el6.x86_64.rpm | 210 kB 00:00
(19/41): nc-1.84-24.el6.x86_64.rpm | 57 kB 00:00
(20/41): net-snmp-libs-5.5-60.el6.x86_64.rpm | 1.5 MB 00:00
(21/41): net-snmp-utils-5.5-60.el6.x86_64.rpm | 177 kB 00:00
(22/41): openais-1.1.1-7.el6.x86_64.rpm | 192 kB 00:00
(23/41): openaislib-1.1.1-7.el6.x86_64.rpm | 82 kB 00:00
(24/41): pacemaker-1.1.15-5.el6.x86_64.rpm | 443 kB 00:00
(25/41): pacemaker-cli-1.1.15-5.el6.x86_64.rpm | 291 kB 00:00
(26/41): pacemaker-cluster-libs-1.1.15-5.el6.x86_64.rpm | 85 kB 00:00
(27/41): pacemaker-libs-1.1.15-5.el6.x86_64.rpm | 483 kB 00:00
(28/41): pcs-0.9.155-2.el6.centos.x86_64.rpm | 4.5 MB 00:00
(29/41): perl-Net-Telnet-3.03-11.el6.noarch.rpm | 56 kB 00:00
(30/41): pexpect-2.3-6.el6.noarch.rpm | 147 kB 00:00
(31/41): python-clufter-0.59.8-1.el6.x86_64.rpm | 378 kB 00:00
(32/41): python-suds-0.4.1-3.el6.noarch.rpm | 218 kB 00:00
(33/41): ricci-0.16.2-87.el6.x86_64.rpm | 633 kB 00:00
(34/41): ruby-1.8.7.374-5.el6.x86_64.rpm | 538 kB 00:00
(35/41): ruby-irb-1.8.7.374-5.el6.x86_64.rpm | 318 kB 00:00
(36/41): ruby-libs-1.8.7.374-5.el6.x86_64.rpm | 1.7 MB 00:00
(37/41): ruby-rdoc-1.8.7.374-5.el6.x86_64.rpm | 381 kB 00:00
(38/41): rubygems-1.3.7-5.el6.noarch.rpm | 207 kB 00:00
(39/41): sg3_utils-1.28-12.el6.x86_64.rpm | 498 kB 00:00
(40/41): telnet-0.17-48.el6.x86_64.rpm | 58 kB 00:00
(41/41): yajl-1.0.7-3.el6.x86_64.rpm | 27 kB 00:00
----------------------------------------------------------------------------------------------
Total 5.7 MB/s | 20 MB 00:03

2) install
yum -y localinstall corosync* pacemaker* pcs*

3) Start pcs
Before your cluster could be configured, you need to start the pcs daemon and boot up at each node:
(node01, node02)
# systemctl start pcsd (service pcsd start)

Also, yum will create a account “hacluster" for management, so you should change it’s password. (Yum also created a user, hacluster.)
(node01, node02)
# passwd hacluster

4) Configure Corosync
Now it’s time to group up each machine as one cluster. This part will use the hacluster account to get pcs authenticate,
we will configure all nodes from one point, we need to authenticate on all nodes before we are allowed to change the configuration
(node01)
# pcs cluster auth node01 node02

3. Create the cluster and add nodes
We’ll start by adding both nodes to a cluster named cluster_web:
pcs cluster setup –name <cluster name> pc1 pc2
(node01)
# pcs cluster setup --name cluster_web node01 node02

The above command creates the cluster node configuration in /etc/corosync/corosync.conf.

After the pcs configured the cluster group, you can start it with this command:
(node01)
# pcs cluster start --all

Then you should see the cluster boot up.

Check status:
(node01)
# pcs status cluster

Check nodes status:
(node01)
# pcs status nodes
# corosync-cmapctl | grep members (corosync-objctl | grep members)
# pcs status corosync

4. Cluster configuration
To check the configuration for errors, and there still are some:
(node01)
# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

The above message tells us that there still is an error regarding STONITH (Shoot The Other Node In The Head),
which is a mechanism to ensure that you don’t end up with two nodes that both think they are active and claim to be the service and virtual IP owner,
also called a split brain situation. Since we have simple cluster, we’ll just disable the stonith option:
(node01)
# pcs property set stonith-enabled=false

To ignore a low quorum:
(node01)
# pcs property set no-quorum-policy=ignore
# pcs property

5. Virtual IP
A virtual IP is a resource. To add the resource:
(node01)
# pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.10.100 cidr_netmask=24 op monitor interval=30s
# pcs status resources

(node01)
# ping -c1 192.168.10.100

To see who is the current owner of the resource/virtual IP:
(node01)
# pcs status|grep virtual_ip

6. Apache webserver configuration
(node01, node02)
1) Install Apache on both nodes:
# yum -y install httpd

2) Create a file /etc/httpd/conf.d/serverstatus.conf with the following contents on both nodes:
(node01, node02)
# vi /etc/httpd/conf.d/serverstatus.conf
Listen 127.0.0.1:80
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>

3) Disable the current Listen-statement in the Apache configuration in order to avoid trying to listen multiple times on the same port.
(node01, node02)
# sed -i 's/Listen/#Listen/' /etc/httpd/conf/httpd.conf

4) Start Apache on both nodes and verify if the status page is working:
(node01, node02)
# systemctl restart httpd (service httpd restart)
# wget http://127.0.0.1/server-status

5) Put a simple webpage in the document-root of the Apache server that contains the node name in order to know which one of the nodes we reach.
This is just temporary.
(node01, node02)
# vi /var/www/html/index.html
<html>
<h1>node01</h1>
</html>

# vi /var/www/html/index.html
<html>
<h1>node02</h1>
</html>

7. Let the cluster control Apache
1) First stop Apache:
(node01, node02)
# systemctl stop httpd (service httpd stop)

2) Then configure where to listen:
(node01, node02)
# echo "Listen 192.168.10.100:80"|tee --append /etc/httpd/conf/httpd.conf

3) Now that Apache is ready to be controlled by our cluster, we’ll add a resource for the webserver.
Remember that we only need to do this from one node since all nodes are configured by PCS:
(node01)
# pcs resource create webserver ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=1min

4) In order to make sure that the virtual IP and webserver always stay together, we can add a constraint:
(node01)
# pcs constraint colocation add webserver virtual_ip INFINITY

5) we need to add another constraint which determines the order of availability of both resources:
(node01)
# pcs constraint order virtual_ip then webserver

6) When both the cluster nodes are not equally powered machines and you would like the resources to be available on the most powerful machine,
you can add another constraint for location:
(node01)
# pcs constraint location webserver prefers node01=50

7) To look at the configured constraints:
(node01)
# pcs constraint

8) After configuring the cluster with the correct constraints, restart it and check the status:
(node01)
# pcs cluster stop --all && pcs cluster start --all

9) well, you should be able to reach the website on the virtual IP address (192.168.10.100):

10) If you want to test the failover, you can stop the cluster for node01 and see if the website is still available on the virtual IP:
(node01)
# pcs cluster stop node01
# pcs status

8. Enable the cluster-components to start up at boot
(node01, node02)
# systemctl enable pcsd
# systemctl enable corosync
# systemctl enable pacemaker

9. Unfortunately, after rebooting the system, the cluster is not starting and the following messages appear in /var/log/messages:

Nov 21 10:43:36 node01 corosync: Starting Corosync Cluster Engine (corosync): [FAILED]^M[ OK ]
Nov 21 10:43:36 node01 systemd: corosync.service: control process exited, code=exited status=1
Nov 21 10:43:36 node01 systemd: Failed to start Corosync Cluster Engine.
Nov 21 10:43:36 node01 systemd: Dependency failed for Pacemaker High Availability Cluster Manager.
Nov 21 10:43:36 node01 systemd:
Nov 21 10:43:36 node01 systemd: Unit corosync.service entered failed state.
Apparently, this is a known bug which is described in Redhat bugzilla bug #1030583.

1) A possible workaround (not so clean), is to delay the Corosync start for 10 seconds in order to be sure that the network interfaces are available.
To do so, edit the systemd-service file for corosync: /usr/lib/systemd/system/corosync.service

[Unit]
Description=Corosync Cluster Engine
ConditionKernelCommandLine=!nocluster
Requires=network-online.target
After=network-online.target

[Service]
ExecStartPre=/usr/bin/sleep 10
ExecStart=/usr/share/corosync/corosync start
ExecStop=/usr/share/corosync/corosync stop
Type=forking

[Install]
WantedBy=multi-user.target

Line 8 was added to get the desired delay when starting Corosync.

2) After changing the service files (customized files should actually reside in /etc/systemd/system), reload the systemd daemon:
# systemctl daemon-reload

After rebooting the system, you should see that the cluster started as it should and that the resources are started automatically.

10. Manage resources
1) Add a service to resource
# pcs resource create syslog service:rsyslog op monitor interval=1min
# pcs constraint colocation add syslog virtual_ip INFINITY

2) Delete a resource
# pcs resource delete syslog

3) Move a resource to node02
# pcs resource move virtual_ip node02

4) Show resource constraint with detail
# pcs constraint show --full

5) Delete a resource constraint (pcs constraint delete resource_id)
# pcs constraint delete location-webserver-node01-50

(*) resouce(tomcat) add example
# pcs resource create rpa ocf:heartbeat:tomcat java_home="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.144-0.b01.el7_4.x86_64/jre" catalina_home="/usr/share/tomcat" tomcat_user="tomcat" op monitor timeout="90s" interval="15s"

# pcs constraint colocation add rpa virtual_ip INFINITY
# pcs cluster stop --all && pcs cluster start --all

jONGPil's Play Notes. Blah ~ Blah ~

Pages

Monday, November 19, 2018

▷ HA(High availability) configuration using corosync/pacemaker/pcs (리눅스에서 HA 구성)

[ corosync/pacemaker/pcs ]

No comments:

Post a Comment

◈ Recent Post

▷ UITest demo with TestOne (Mobile, Keypad and Drag until found tip)

◈ Popular Posts

Pageviews

Translate