What is VMware HA?
As per VMware Definition,
VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity
VMware® High Availability (HA) provides easy to use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers with spare capacity
What is
AAM in HA?
AAM is
the Legato automated availability management. Prior to vSphere 4.1, VMware’s HA is
actually re engineered to work with VM’s with the help of Legato’s
Automated Availability Manager (AAM) software. VMware’s vCenter agent (vpxa) interfaces with the VMware HA agent which acts as an
intermediary to the AAM software. From vSphere 5.0, it
uses an agent called “FDM” (Fault Domain Manager).
What
are pre-requites for HA to work?
1.Shared storage for the VMs running in HA cluster
2.Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing
3.Create VMHA enabled Cluster
4.Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement)
2.Essentials plus, standard, Advanced, Enterprise and Enterprise Plus Licensing
3.Create VMHA enabled Cluster
4.Management network redundancy to avoid frequent isolation response in case of temporary network issues (preferred not a requirement)
What is maximum number of primary HA hosts in vSphere 4.1?
Maximum
number of primary HA host is 5. VMware HA cluster chooses the first 5 hosts
that joins the cluster as primary nodes and all others hosts
are automatically selected as secondary nodes.
How to
see the list of Primary nodes in HA cluster?
View
the log file named “aam_config_util_listnodes.log” under /var/log/vmware/aam using
the below command
cat /var/log/vmware/aam/aam_config_util_listnodes.log
What is
the command to restart /Start/Stop HA agent in the ESX host?
service vmware-aam restart
service vmware-aam stop
service vmware-aam start
Where
to located HA related logs in case of troubleshooting?
/Var/log/vmware/aam
What
the basic troubleshooting steps in case of HA agent install failed on hosts in
HA cluster?
1.
Check for some network issues
2.
Check the DNS is configured properly
3.
Check the vmware HA
agent status in ESX host by
using below commands
service vmware-aam status
4.
Check the networks are properly configured and named exactly as other hosts in
the cluster. Otherwise, you
will get the below errors while installing or reconfiguring HA agent.
5.
Check HA related ports are open in firewall to
allow for the communication
Incoming port: TCP/UDP 8042-8045
Outgoing port: TCP/UDP 2050-2250
Incoming port: TCP/UDP 8042-8045
Outgoing port: TCP/UDP 2050-2250
6.
First try to restart /stop/start the vmware HA
agent on the affected host using the below commands. In addition u can also try to restart vpxa and management
agent in the Host.
service vmware-aam restart
service vmware-aam stop
service vmware-aam start
7.
Right Click the affected host and click on “Reconfigure for VMware HA” to re-install the HA
agent that particular host.
8. Remove
the affected host from the cluster. Removing ESX host from the cluster will not
be allowed until that
host is put into maintenance mode.
9. Alternative solution for 3 step is, Go to
cluster settings and uncheck the VMware HA in to turn off the HA in that cluster and re-enable the VMware HA to get the agent
installed.
10. For
further troubleshooting, review the HA logs under /Var/log/VMware/aam directory.
What is
the maximum number of hosts per HA cluster?
Maximum
number of hosts in the HA cluster is 32
What is
Host Isolation?
VMware
HA has a mechanism to detect a host is isolated from rest of hosts in the cluster. When the
ESX host loses its ability to exchange heartbeat via management network between the other hosts
in the HA cluster, that ESX host will be considered as an Isolated.
How
Host Isolation is detected?
In HA
cluster, ESX hosts uses heartbeats to communicate among other
hosts in the cluster. By
default, Heartbeat will be sent every 1 second.
If an ESX host in the cluster didn’t received heartbeat for 13 seconds
from any other hosts in the cluster, the host considered it as isolated and
host will ping the configured isolation address (default
gateway by default). If the ping fails, VMware HA will execute the Host
isolation response
What
are the different types isolation
response available in HA?
Power
off –
All the VMs are powered off , when the HA detects that the network
isolation occurs
Shut
down – All
VMs running on that host are shut down with the help of VMware Tools, when the
HA detects that the network isolation occurs.If the
shutdown via VMWare tools
not happened within 5 minutes, VM’s power off operation
will be executed. This behavior can be changed with the help of HA advanced
options. Please refer my Post on HA Advanced
configuration
Leave powered on – The VM’s state remain powered on or remain unchanged, when the HA detects that the network isolation occurs.
Leave powered on – The VM’s state remain powered on or remain unchanged, when the HA detects that the network isolation occurs.
How to
add additional
isolation address for redundancy?
By
default, VMWare HA use to ping default
gateway as the isolation address if it stops receiving heartbeat.We can add an additional values in case
if we are using redundant service console both belongs to different subnet.Let’s say we can add the
default gateway of SC1 as first value and gateway of SC2 as the additional
one using the below value
1. Right Click your HA cluster
2. Goto to advanced options of HA
3. Add
the line “das.isolationaddress1
= 192.168.0.1″
4. Add
the line “das.isolationaddress2
= 192.168.1.1″ as the additional isolation address
As per
“VMware Availability Guide”,
V Center Server uses admission control to ensure that
sufficient resources are available in a cluster to provide failover protection
and to ensure that virtual machine resource reservations are respected.
What
are the 2 types of settings available for admission control ?
Enable: Do not
power on VMs that violate availability constraints
Disable: Power
on VMs that violate availability constraints
What
are the different types of Admission control policy available with VMware HA?
There
are 3 different types of Admission control policy available.
Host
failures cluster tolerates
Percentage of cluster resources reserved as fail over spare capacity
Specify a fail over host
Percentage of cluster resources reserved as fail over spare capacity
Specify a fail over host
How the Host Failures cluster tolerates admission control
policy works?
Select the maximum number of host failures that you can afford for or to guarantee fail over. Prior vSphere 4.1, Minimum is 1 and the maximum is 4.
Select the maximum number of host failures that you can afford for or to guarantee fail over. Prior vSphere 4.1, Minimum is 1 and the maximum is 4.
In the
Host Failures cluster tolerates admission control policy , we can define the
specific number of hosts that can fail in the cluster and also it
ensures that the sufficient resources remain to fail over all the virtual
machines from that failed hosts to the other hosts incluster. VMware High Availability(HA)
uses a mechanism called slots to calculate both the available and required
resources in the cluster for a failing over virtual machines from a failed host
to other hosts in the cluster.
What is
SLOT?
As per
VMWare’s Definition,
“A slot
is a logical representation of the memory and CPU resources that satisfy the
requirements for any powered-on virtual machine in the cluster.”
If you
have configured reservations at VM level, It influence the HA slot calculation.
Highest memory reservation and highest CPU reservation of the VM in your
cluster determines the slot size for the cluster.
How the HA Slots are Calculated?
How to
Check the HA Slot information from vSphere Client?
Click
on Cluster Summary Tab and Click on “Advanced Runtime Info” to see the the
detailed HA slots information.
What is
use of Host Monitoring status in HA cluster?
Let’s
take an example, you are performing network
maintenance activity on your switches which connects your one of th ESX host in HA cluster.
what will happen if the switch connected to the ESX host in HA
cluster is down?
It will
not receive heartbeat and also ping to the isolation address also failed. so, host will
think itself as isolated and HA will initiate the reboot of virtual machines on
the host to other hosts in the cluster. Why do you need this unwanted situation
while performing scheduled maintenance window.
To
avoid the above situation when performing scheduled activity which may cause ESX hostto isolate, remove the check box in ” Enable
Host Monitoring” until you are done with the network maintenance activity.
How to Manually define the HA Slot size?
By
default, HA slot size is
determined by the Virtual machine Highest CPU and memory reservation. If no
reservation is specified at the VM level, default slot size of 256 MHZ for CPU
and 0 MB + memory overhead for RAM will be taken as slot size. We can control
the HA slot size manually by using the following values.
There
are 4 options we can configure at HA advanced options related to slot size
das.slotMemInMB – Maximum Bound
value for HA memory slot size
das.slotCpuInMHz – Maximum Bound value for HA CPU slot Size
das.vmMemoryMinMB – Minimum Bound value for HA memory slot size
das.vmCpuMinMHz – Minimum Bound value for HA CPU slot size
das.slotCpuInMHz – Maximum Bound value for HA CPU slot Size
das.vmMemoryMinMB – Minimum Bound value for HA memory slot size
das.vmCpuMinMHz – Minimum Bound value for HA CPU slot size
How the
“Percentage of cluster resources reserved as failover spare capacity” admission
control policy works?
In the Percentage of cluster resources reserved as failover spare capacity
admission control policy, We can define the specific percentage of total
cluster resources are reserved for failover.In
contrast to the “Host Failures cluster tolerates admission control policy”, It
will not use slots. Instead This policy calculates the in the way below
1.It calculates the Total resource requirement for all Powered-on
Virtual Machines in the cluster and also calculates the total resource
available in host for virtual machines.
2.It calculates the current CPU and Memory Failover capacity for the capacity.
3.If the current CPU and Memory Failover capacity for the cluster < configured failover capacity (ex 25 %)
4.Admission control will not allow to power on the virtual machine which violates the availability constraints.
2.It calculates the current CPU and Memory Failover capacity for the capacity.
3.If the current CPU and Memory Failover capacity for the cluster < configured failover capacity (ex 25 %)
4.Admission control will not allow to power on the virtual machine which violates the availability constraints.
How the
“Specify a failover host” admission control policy works?
In the Specify a failover host” admission control policy, We can define a
specific host as a dedicated failover host. When isolation
response is detected, HA attempts to restart the virtual machines on the
specified failover host.In this
Approach, dedicated failover hist will be
sitting idle without actively involving or not participating in DRS load
balancing.DRS
will not migrate or power on placement of virtual machines on the defined
failover host.
What is VM Monitoring status?
HA will
usually monitors ESX
hosts and reboot the virtual machine in the failed hosts in the other host in
the cluster in case of host isolation but i need
the HA to monitors for
Virtual machine failures also. here the
feature called VM monitoring status as part of HA settings.VM monitoring restarts the virtual machine if the vmware tools heartbeat didn’t
received withthe
specified time using Monitoring sensitivity.
No comments:
Post a Comment