What the basic troubleshooting steps in case of HA agent install
failed on hosts in HA cluster?
If you are facing any issues related to hosts in
the HA cluster , I would recommend to follow the below
basic 10 troubleshooting steps. Most of the time, This will resolve the issues.
Error message will be similar to the below one
1.
Check your environment, if any temporary network problem exists
2.
Check the DNS is configured properly
3.
Check the vmware HA agent status in ESX host by using below commands
service vmware-aam status
4.
Check the ESX networks are properly configured and named exactly as
other hosts in the cluster.
otherwise, you will get the below errors while installing or
reconfiguring HA agent.
5. Check HA related ports are open in firewall to allow for
the communication
Incoming port: TCP/UDP 8042-8045
Outgoing port: TCP/UDP 2050-2250
6. Try
to restart /stop/start the vmware HA agent on the affected host using the below
commands.
In addition, u can also try to restart vpxa and management agent in the Host.
service
vmware-aam restart
service
vmware-aam stop
service
vmware-aam start
7.
Right Click the affected host and click on “Reconfigure for VMWare HA” to
re-install the HA agent that particular host.
8.
Remove the affected host from the cluster. Removing ESX host from the cluster
will not be allowed untill that host is put into maintenance mode.
9.Alternative
solution for 8 step is, Goto cluster settings and uncheck the vmware HA to
turnoff the HA in that cluster and re-enable the vmware HA to get the agent
installed from the scratch.
10. For
further troubleshooting , review the HA logs under /Var/log/vmware/aam
directory.