VMware vSphere 7.x Study Guide for VMware Certified Professional – Data Center Virtualization certification. This article covers Section 1: Architectures and Technologies. Objective 1.6.4 – Describe vSphere High Availability.
This article is part of the VMware vSphere 7.x - VCP-DCV Study Guide. Check out this page first for an introduction, disclaimer, and updates on the guide. The page also includes a collection of articles matching each objective of the official VCP-DCV.
Describe vSphere High Availability
This article focuses on objective 1.6.4 – Describe vSphere High Availability. Here, we overview the concept of vSphere HA and Primary and Secondary hosts. Next, we study vSphere availability settings, including failures and responses in HA, admission control, and heartbeat datastores. Finally, we take a quick look at Proactive HA.
This is a big topic, and it is a child of Objective 1.6 – Describe ESXi cluster concepts. It would be best to read it before moving to this one, as vSphere HR is a feature of vSphere Cluster.
Also, it would be good to read about affinity rules in DRS.
1. What is vSphere High Availability (HA)
vSphere HA clusters enable a collection of ESXi hosts to work together so that, as a group, they provide higher levels of availability for virtual machines than each ESXi host can provide individually. When you plan the creation and usage of a new vSphere HA cluster, the options you select affect how the cluster responds to failures of hosts or virtual machines.
- VMware HA provides easy-to-use, cost-effective high availability for all applications running in virtual machines.
- In server failure, affected virtual machines are automatically restarted on other host machines in the cluster with spare capacity.
- VMware HA minimizes downtime and IT service disruption while eliminating the need for dedicated standby hardware and the installation of additional software.
- VMware HA provides consistent high availability across the entire virtualized IT environment without the cost and complexity of failover solutions tied to either operating systems or specific applications.
2. Primary and Secondary Hosts
When you add a host to a vSphere HA cluster, an agent is uploaded to the host and configured to communicate with other agents in the cluster.
- Each host in the cluster functions as a primary host or a secondary host.
- When vSphere HA is enabled for a cluster, all active hosts participate in an election to choose the cluster's primary host.
- The host that mounts the most significant number of datastores has an advantage in the election.
- Only one primary host typically exists per cluster, and all other hosts are secondary.
- A new election is held if the primary host fails, is shut down or put in standby mode, or is removed from the cluster.
The primary host in a cluster has several responsibilities:
- Monitor the state of secondary hosts. If a secondary host fails or becomes unreachable, the primary host identifies which virtual machines must be restarted.
- Monitor the power state of all protected virtual machines. If one virtual machine fails, the primary host ensures that it is restarted.
- Determine where the restart occurs using a local placement engine.
- Manage the lists of cluster hosts and protected virtual machines.
- Act as the vCenter Server management interface to the cluster and report the cluster health state.
3. vSphere Availability Settings
When you create a vSphere HA cluster or configure an existing cluster, you must configure settings that determine how the feature works.
In the vSphere Client, you can configure following the vSphere HA settings:
Failures and responses: Provide settings for host failure responses, host isolation, VM monitoring, and VM Component Protection.
Admission Control: Enable or disable admission control for the vSphere HA cluster and choose a policy for how it is enforced.
Heartbeat Datastores: Specify preferences for vSphere HA's datastores for datastore heartbeating.
Advanced Options: Customize vSphere HA behavior by setting advanced options
For the vSphere 7 exam, it would be essential to know how vSphere HA identifies host failures and isolation and responds to these situations. You also should know how admission control works to choose the policy that fits failover needs.
3.1 Failures and responses
Host Failure Types
The primary host of a VMware vSphere High Availability cluster is responsible for detecting the failure of secondary hosts. Depending on the type of failure saw, the virtual machines running on the hosts
might need to be failed over.
In a vSphere HA cluster, three types of host failure are detected:
- Failure. A host stops functioning.
- Isolation. A host becomes network isolated.
- Partition. A host loses network connectivity with the primary host.
Host Isolation Response
Host isolation response determines what happens when a host in a vSphere HA cluster loses its management network connections but continues to run.
- You can use the isolation response to have vSphere HA power off virtual machines running on an isolated host and restart them on a non-isolated host.
- Host isolation responses require that Host Monitoring Status is enabled.
- If Host Monitoring Status is disabled, host isolation responses are also suspended.
- A host determines that it is isolated when it cannot communicate with the agents running on the other hosts, and it cannot ping its isolation addresses.
- The host then executes its isolation response.
- The responses are Power off and restart VMs or Shutdown and restart VMs.
- You can customize this property for individual virtual machines.
3.2 VM and Application Monitoring
VM Monitoring restarts individual virtual machines if their VMware Tools heartbeats are not received within a set time. Application Monitoring can restart a virtual machine if the heartbeats for an application are not received.
You can enable these features and configure the sensitivity with which vSphere HA monitors non-responsiveness.
About VM Monitoring:
- VM Monitoring service evaluates whether each virtual machine in the cluster is running by checking for regular heartbeats and I/O activity from the VMware Tools process running inside the guest.
- If no heartbeats or I/O activity are received, this is most likely because the guest operating system has failed or VMware Tools is not being allocated any time to complete tasks.
- VM Monitoring service determines that the virtual machine has failed, and the virtual machine is rebooted to restore service.
VM Monitoring Settings
3.3 vSphere HA Admission Control
vSphere HA uses admission control to ensure that sufficient resources are reserved for virtual machine recovery when a host fails.
Admission control imposes constraints on resource usage. Any action that might violate these constraints is not permitted. Actions that might be disallowed include the following examples:
- Powering on a virtual machine
- Migrating a virtual machine
- Increasing the CPU or memory reservation of a virtual machine
The basis for vSphere HA admission control is how many host failures your cluster is allowed to tolerate and still guarantee failover. The host failover capacity can be set in three ways:
- Cluster resource percentage
- Slot policy
- Dedicated failover hosts
3.4 Heartbeat Datastores
When the primary host in a VMware vSphere High Availability cluster cannot communicate with a secondary host over the management network:
- The primary host uses datastore heartbeating to determine whether the secondary host has failed, is in a network partition, or is network isolated.
- If the secondary host has stopped datastore heartbeating, it is considered to have failed, and its virtual machines are restarted elsewhere.
- VMware vCenter Server selects a preferred set of datastores for heartbeating.
- This selection is made to maximize the number of hosts with access to a heartbeating datastore and minimize the likelihood that the datastores are backed by the same LUN or NFS server.
- You can use the advanced option das.heartbeatdsperhost to change the number of heartbeat datastores selected by vCenter Server for each host.
- The default is two, and the maximum valid value is five.
4. Proactive HA Failures
A Proactive HA failure occurs when a host component fails, which results in a loss of redundancy or a noncatastrophic failure. However, the functional behavior of the VMs residing on the host is not yet affected. For example, if a power supply on the host fails, but other power supplies are available, that is a Proactive HA failure.
- If a Proactive HA failure occurs, you can automate the remediation action taken in the vSphere Availability section of the vSphere Client.
- The VMs on the affected host can be evacuated to other hosts, and the host is either placed in Quarantine mode or Maintenance mode.
Resources
Conclusion
The topic reviewed in this article is part of the VMware vSphere 7.x Exam (2V0-21.20), which leads to the VMware Certified Professional – Data Center Virtualization 2021 certification.
Section 1 - Architectures and Technologies.
Objective 1.6.4 – Describe vSphere High Availability
See the full exam preparation guide and all exam sections from VMware.