Monday, November 19, 2018

▷ HA(High availability) configuration using corosync/pacemaker/pcs (리눅스에서 HA 구성)

[ corosync/pacemaker/pcs ]


1) Packages
- corosync:  messaging component for communication and membership like heartbeat (required)
- pacemaker: resource manager for stop/start resources (required)
- pcs:       cluster manager for cluster settings (optional)

2) Nodes
- Two CentOS 7.x
node01 = Master
node02 = Slave

1. Environment
1) Hostname change (optional)
(node01)
# vi /etc/sysconfig/network 에서 HOSTNAME=node01
(node02)
# vi /etc/sysconfig/network 에서 HOSTNAME=node02

2) /etc/hosts file change
(node01, node02)
# vi /etc/hosts
192.168.10.10 node01
192.168.10.20 node02

3) hostname check
(node01, node02)
# uname -n or hostname

4) time sync
(node01, node02)
# rdate -s time.bora.net

5) firewall rules
iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT
iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT
iptables -I INPUT -p igmp -j ACCEPT
iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT
service iptables save

- Open UDP 5404 & 5405 for Corosync:
- Open TCP 2224 for PCS:
- Allow IGMP traffic:
- Allow multicast-traffic:

2. corosync/pacemaker/pcs install
1) download
yum -y install corosync pacemaker pcs --downloadonly --downloaddir=./
=====================================================================================================
 Package                                Arch       Version                      Repository    Size
=====================================================================================================
Installing:
 corosync                               x86_64     1.4.7-5.el6                  base       216 k
 pacemaker                              x86_64     1.1.15-5.el6                 base       443 k
 pcs                                    x86_64     0.9.155-2.el6.centos         base       4.5 M
Installing for dependencies:
 ccs                                    x86_64     0.16.2-87.el6                base        57 k
 clusterlib                             x86_64     3.0.12.1-84.el6              base       109 k
 cman                                   x86_64     3.0.12.1-84.el6              base       454 k
 compat-readline5                       x86_64     5.2-17.1.el6                 base       130 k
 corosynclib                            x86_64     1.4.7-5.el6                  base       194 k
 cyrus-sasl-md5                         x86_64     2.1.23-15.el6_6.2            base        47 k
 fence-agents                           x86_64     4.0.15-13.el6                base       193 k
 fence-virt                             x86_64     0.2.3-24.el6                 base        39 k
 gnutls-utils                           x86_64     2.12.23-21.el6               base       109 k
 ipmitool                               x86_64     1.8.15-2.el6                 base       465 k
 libibverbs                             x86_64     1.1.8-4.el6                  base        53 k
 libqb                                  x86_64     0.17.1-2.el6                 base        71 k
 librdmacm                              x86_64     1.0.21-0.el6                 base        60 k
 libvirt-client                         x86_64     0.10.2-62.el6                base       4.1 M
 lm_sensors-libs                        x86_64     3.1.1-17.el6                 base        38 k
 modcluster                             x86_64     0.16.2-35.el6                base       210 k
 nc                                     x86_64     1.84-24.el6                  base        57 k
 net-snmp-libs                          x86_64     1:5.5-60.el6                 base       1.5 M
 net-snmp-utils                         x86_64     1:5.5-60.el6                 base       177 k
 openais                                x86_64     1.1.1-7.el6                  base       192 k
 openaislib                             x86_64     1.1.1-7.el6                  base        82 k
 pacemaker-cli                          x86_64     1.1.15-5.el6                 base       291 k
 pacemaker-cluster-libs                 x86_64     1.1.15-5.el6                 base        85 k
 pacemaker-libs                         x86_64     1.1.15-5.el6                 base       483 k
 perl-Net-Telnet                        noarch     3.03-11.el6                  base        56 k
 pexpect                                noarch     2.3-6.el6                    base       147 k
 python-clufter                         x86_64     0.59.8-1.el6                 base       378 k
 python-suds                            noarch     0.4.1-3.el6                  base       218 k
 ricci                                  x86_64     0.16.2-87.el6                base       633 k
 ruby                                   x86_64     1.8.7.374-5.el6              base       538 k
 ruby-irb                               x86_64     1.8.7.374-5.el6              base       318 k
 ruby-libs                              x86_64     1.8.7.374-5.el6              base       1.7 M
 ruby-rdoc                              x86_64     1.8.7.374-5.el6              base       381 k
 rubygems                               noarch     1.3.7-5.el6                  base       207 k
 sg3_utils                              x86_64     1.28-12.el6                  base       498 k
 telnet                                 x86_64     1:0.17-48.el6                base        58 k
 yajl                                   x86_64     1.0.7-3.el6                  base        27 k
Updating for dependencies:
 gnutls                                 x86_64     2.12.23-21.el6               base       389 k

Transaction Summary
==================================================================================================
Install      40 Package(s)
Upgrade       1 Package(s)

Total download size: 20 M
Downloading Packages:
(1/41): ccs-0.16.2-87.el6.x86_64.rpm                                 |  57 kB     00:00   
(2/41): clusterlib-3.0.12.1-84.el6.x86_64.rpm                        | 109 kB     00:00   
(3/41): cman-3.0.12.1-84.el6.x86_64.rpm                              | 454 kB     00:00   
(4/41): compat-readline5-5.2-17.1.el6.x86_64.rpm                     | 130 kB     00:00   
(5/41): corosync-1.4.7-5.el6.x86_64.rpm                              | 216 kB     00:00   
(6/41): corosynclib-1.4.7-5.el6.x86_64.rpm                           | 194 kB     00:00   
(7/41): cyrus-sasl-md5-2.1.23-15.el6_6.2.x86_64.rpm                  |  47 kB     00:00   
(8/41): fence-agents-4.0.15-13.el6.x86_64.rpm                        | 193 kB     00:00   
(9/41): fence-virt-0.2.3-24.el6.x86_64.rpm                           |  39 kB     00:00   
(10/41): gnutls-2.12.23-21.el6.x86_64.rpm                            | 389 kB     00:00   
(11/41): gnutls-utils-2.12.23-21.el6.x86_64.rpm                      | 109 kB     00:00   
(12/41): ipmitool-1.8.15-2.el6.x86_64.rpm                            | 465 kB     00:00   
(13/41): libibverbs-1.1.8-4.el6.x86_64.rpm                           |  53 kB     00:00   
(14/41): libqb-0.17.1-2.el6.x86_64.rpm                               |  71 kB     00:00   
(15/41): librdmacm-1.0.21-0.el6.x86_64.rpm                           |  60 kB     00:00   
(16/41): libvirt-client-0.10.2-62.el6.x86_64.rpm                     | 4.1 MB     00:00   
(17/41): lm_sensors-libs-3.1.1-17.el6.x86_64.rpm                     |  38 kB     00:00   
(18/41): modcluster-0.16.2-35.el6.x86_64.rpm                         | 210 kB     00:00   
(19/41): nc-1.84-24.el6.x86_64.rpm                                   |  57 kB     00:00   
(20/41): net-snmp-libs-5.5-60.el6.x86_64.rpm                         | 1.5 MB     00:00   
(21/41): net-snmp-utils-5.5-60.el6.x86_64.rpm                        | 177 kB     00:00   
(22/41): openais-1.1.1-7.el6.x86_64.rpm                              | 192 kB     00:00   
(23/41): openaislib-1.1.1-7.el6.x86_64.rpm                           |  82 kB     00:00   
(24/41): pacemaker-1.1.15-5.el6.x86_64.rpm                           | 443 kB     00:00   
(25/41): pacemaker-cli-1.1.15-5.el6.x86_64.rpm                       | 291 kB     00:00   
(26/41): pacemaker-cluster-libs-1.1.15-5.el6.x86_64.rpm              |  85 kB     00:00   
(27/41): pacemaker-libs-1.1.15-5.el6.x86_64.rpm                      | 483 kB     00:00   
(28/41): pcs-0.9.155-2.el6.centos.x86_64.rpm                         | 4.5 MB     00:00   
(29/41): perl-Net-Telnet-3.03-11.el6.noarch.rpm                      |  56 kB     00:00   
(30/41): pexpect-2.3-6.el6.noarch.rpm                                | 147 kB     00:00   
(31/41): python-clufter-0.59.8-1.el6.x86_64.rpm                      | 378 kB     00:00   
(32/41): python-suds-0.4.1-3.el6.noarch.rpm                          | 218 kB     00:00   
(33/41): ricci-0.16.2-87.el6.x86_64.rpm                              | 633 kB     00:00   
(34/41): ruby-1.8.7.374-5.el6.x86_64.rpm                             | 538 kB     00:00   
(35/41): ruby-irb-1.8.7.374-5.el6.x86_64.rpm                         | 318 kB     00:00   
(36/41): ruby-libs-1.8.7.374-5.el6.x86_64.rpm                        | 1.7 MB     00:00   
(37/41): ruby-rdoc-1.8.7.374-5.el6.x86_64.rpm                        | 381 kB     00:00   
(38/41): rubygems-1.3.7-5.el6.noarch.rpm                             | 207 kB     00:00   
(39/41): sg3_utils-1.28-12.el6.x86_64.rpm                            | 498 kB     00:00   
(40/41): telnet-0.17-48.el6.x86_64.rpm                               |  58 kB     00:00   
(41/41): yajl-1.0.7-3.el6.x86_64.rpm                                 |  27 kB     00:00   
----------------------------------------------------------------------------------------------
Total                                                       5.7 MB/s |  20 MB     00:03   

2) install
yum -y localinstall corosync* pacemaker* pcs*

3) Start pcs
Before your cluster could be configured, you need to start the pcs daemon and boot up at each node:
(node01, node02)
# systemctl start pcsd (service pcsd start)

Also, yum will create a account “hacluster" for management, so you should change it’s password. (Yum also created a user, hacluster.)
(node01, node02)
# passwd hacluster

4) Configure Corosync
Now it’s time to group up each machine as one cluster. This part will use the hacluster account to get pcs authenticate,
we will configure all nodes from one point, we need to authenticate on all nodes before we are allowed to change the configuration
(node01)
# pcs cluster auth node01 node02

3. Create the cluster and add nodes
We’ll start by adding both nodes to a cluster named cluster_web:
pcs cluster setup –name <cluster name> pc1 pc2
(node01)
# pcs cluster setup --name cluster_web node01 node02

The above command creates the cluster node configuration in /etc/corosync/corosync.conf.

After the pcs configured the cluster group, you can start it with this command:
(node01)
# pcs cluster start --all

Then you should see the cluster boot up.

Check status:
(node01)
# pcs status cluster

Check nodes status:
(node01)
# pcs status nodes
# corosync-cmapctl | grep members (corosync-objctl | grep members)
# pcs status corosync

4. Cluster configuration
To check the configuration for errors, and there still are some:
(node01)
# crm_verify -L -V
 error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
 error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
 error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
 Errors found during check: config not valid

The above message tells us that there still is an error regarding STONITH (Shoot The Other Node In The Head),
which is a mechanism to ensure that you don’t end up with two nodes that both think they are active and claim to be the service and virtual IP owner,
also called a split brain situation. Since we have simple cluster, we’ll just disable the stonith option:
(node01)
# pcs property set stonith-enabled=false

To ignore a low quorum:
(node01)
# pcs property set no-quorum-policy=ignore
# pcs property

5. Virtual IP
A virtual IP is a resource. To add the resource:
(node01)
# pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.10.100 cidr_netmask=24 op monitor interval=30s
# pcs status resources

(node01)
# ping -c1 192.168.10.100

To see who is the current owner of the resource/virtual IP:
(node01)
# pcs status|grep virtual_ip

6. Apache webserver configuration
(node01, node02)
1) Install Apache on both nodes:
# yum -y install httpd

2) Create a file /etc/httpd/conf.d/serverstatus.conf with the following contents on both nodes:
(node01, node02)
# vi /etc/httpd/conf.d/serverstatus.conf
 Listen 127.0.0.1:80
 <Location /server-status>
 SetHandler server-status
 Order deny,allow
 Deny from all
 Allow from 127.0.0.1
 </Location>

3) Disable the current Listen-statement in the Apache configuration in order to avoid trying to listen multiple times on the same port.
(node01, node02)
# sed -i 's/Listen/#Listen/' /etc/httpd/conf/httpd.conf

4) Start Apache on both nodes and verify if the status page is working:
(node01, node02)
# systemctl restart httpd (service httpd restart)
# wget http://127.0.0.1/server-status

5) Put a simple webpage in the document-root of the Apache server that contains the node name in order to know which one of the nodes we reach.
This is just temporary.
(node01, node02)
# vi  /var/www/html/index.html
 <html>
 <h1>node01</h1>
 </html>

# vi  /var/www/html/index.html
 <html>
 <h1>node02</h1>
 </html>

7. Let the cluster control Apache
1) First stop Apache:
(node01, node02)
# systemctl stop httpd (service httpd stop)

2) Then configure where to listen:
(node01, node02)
# echo "Listen 192.168.10.100:80"|tee --append /etc/httpd/conf/httpd.conf

3) Now that Apache is ready to be controlled by our cluster, we’ll add a resource for the webserver.
Remember that we only need to do this from one node since all nodes are configured by PCS:
(node01)
# pcs resource create webserver ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=1min

4) In order to make sure that the virtual IP and webserver always stay together, we can add a constraint:
(node01)
# pcs constraint colocation add webserver virtual_ip INFINITY

5) we need to add another constraint which determines the order of availability of both resources:
(node01)
# pcs constraint order virtual_ip then webserver

6) When both the cluster nodes are not equally powered machines and you would like the resources to be available on the most powerful machine,
you can add another constraint for location:
(node01)
# pcs constraint location webserver prefers node01=50

7) To look at the configured constraints:
(node01)
# pcs constraint

8) After configuring the cluster with the correct constraints, restart it and check the status:
(node01)
# pcs cluster stop --all && pcs cluster start --all

9) well, you should be able to reach the website on the virtual IP address (192.168.10.100):

10) If you want to test the failover, you can stop the cluster for node01 and see if the website is still available on the virtual IP:
(node01)
# pcs cluster stop node01
# pcs status

8. Enable the cluster-components to start up at boot
(node01, node02)
# systemctl enable pcsd
# systemctl enable corosync
# systemctl enable pacemaker

9. Unfortunately, after rebooting the system, the cluster is not starting and the following messages appear in /var/log/messages:

Nov 21 10:43:36 node01 corosync: Starting Corosync Cluster Engine (corosync): [FAILED]^M[  OK  ]
Nov 21 10:43:36 node01 systemd: corosync.service: control process exited, code=exited status=1
Nov 21 10:43:36 node01 systemd: Failed to start Corosync Cluster Engine.
Nov 21 10:43:36 node01 systemd: Dependency failed for Pacemaker High Availability Cluster Manager.
Nov 21 10:43:36 node01 systemd:
Nov 21 10:43:36 node01 systemd: Unit corosync.service entered failed state.
Apparently, this is a known bug which is described in Redhat bugzilla bug #1030583.

1) A possible workaround (not so clean), is to delay the Corosync start for 10 seconds in order to be sure that the network interfaces are available.
To do so, edit the systemd-service file for corosync: /usr/lib/systemd/system/corosync.service

[Unit]
Description=Corosync Cluster Engine
ConditionKernelCommandLine=!nocluster
Requires=network-online.target
After=network-online.target

[Service]
ExecStartPre=/usr/bin/sleep 10
ExecStart=/usr/share/corosync/corosync start
ExecStop=/usr/share/corosync/corosync stop
Type=forking

[Install]
WantedBy=multi-user.target

Line 8 was added to get the desired delay when starting Corosync.

2) After changing the service files (customized files should actually reside in /etc/systemd/system), reload the systemd daemon:
# systemctl daemon-reload

After rebooting the system, you should see that the cluster started as it should and that the resources are started automatically.

10. Manage resources
1) Add a service to resource
# pcs resource create syslog service:rsyslog op monitor interval=1min
# pcs constraint colocation add syslog virtual_ip INFINITY

2) Delete a resource
# pcs resource delete syslog

3) Move a resource to node02
# pcs resource move virtual_ip node02

4) Show resource constraint with detail
# pcs constraint show --full

5) Delete a resource constraint (pcs constraint delete resource_id)
# pcs constraint delete location-webserver-node01-50

(*) resouce(tomcat) add example
# pcs resource create rpa ocf:heartbeat:tomcat java_home="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.144-0.b01.el7_4.x86_64/jre" catalina_home="/usr/share/tomcat" tomcat_user="tomcat" op monitor timeout="90s" interval="15s"

# pcs constraint colocation add rpa virtual_ip INFINITY
# pcs cluster stop --all && pcs cluster start --all

No comments:

Post a Comment

◈ Recent Post

▷ UITest demo with TestOne (Mobile, Keypad and Drag until found tip)

[ UITest Demo Environment ] 1. UITest Solution: TestOne 2. Description 데모 설명    How to use keypad, and to drag until found.     키패드를...

◈ Popular Posts