VMware vSphere HA and FT – Troubleshooting

Below you can find the important Configuration and Operational errors that might occur with vSphere HA and FT and how to handle them.

vSphere High Availability Error Messages

This table lists some of the error messages you can encounter when configuring a vSphere HA cluster:

Configuration Errors

Operation Error Message Description and Solution
Migrating a powered-off virtual machine If this operation is performed, the virtual machine will lose vSphere HA protection. If you proceed with this operation and it completes successfully, vSphere HA will stop trying to restart this VM, which is currently powered off due to a previous failure. You can restore vSphere HA protection by powering the VM back on in the cluster. The virtual machine you are attempting to migrate is protected by vSphere HA but is currently powered-off because of an earlier failure. To restore vSphere HA protection, manually power the virtual machine back on.
Migrating a powered-on virtual machine If this operation is performed, the virtual machine will lose vSphere HA protection. If you proceed with this operation and it completes successfully, vSphere HA will not attempt to restart the VM after a subsequent failure. vSphere HA protection will be restored when any network partitions or disk accessibility issues are resolved. The powered-on virtual machine you are attempting to migrate will not be protected by vSphere HA after the vMotion operation completes because the vSphere HA master agent is not currently responsible for it. vSphere HA will not restart the virtual machine if it subsequently fails. To restore vSphere HA protection, resolve any network partitions or disk accessibility issues.
Powering on or reconfiguring a virtual machine Insufficient resources to satisfy configured failover level for vSphere HA The operation would violate the configured failover level of the vSphere HA cluster. Either change the admission control policy to reserve fewer resources for failover or add more hosts to the cluster.

Check the CPU and Memory reservations of the virtual machine in question.  As a virtual machine starts up, slot calculation will be affected and fewer slots will be available when a machine with high reservation values is started.

Configuring vSphere HA on a host Cannot complete the configuration of the vSphere HA agent on the host. Misconfiguration in the host network setup vSphere HA cannot be configured on a host because of a network-related issue. The most likely cause of this problem is that the host has no configured management networks. When this condition occurs, vSphere HA issues a host-level event (“Host has no port groups enabled for vSphere HA” or “Host has no available networks for vSphere HA communication”) reporting the problem.
Configuring vSphere HA on a host Cannot complete the configuration of the vSphere HA agent on the host. Misconfiguration in the host setup vSphere HA cannot be configured on a host probably because the host does not have an SSL thumbprint. When this problem occurs, vSphere HA issues a host-level event that reports a problem with the SSL thumbprint. If the host is using self-signed certificates, check that vCenter Server is configured to verify SSL certificates and verify the thumbprints for the hosts in the vSphere HA cluster. Then reconfigure vSphere HA.
Configuring vSphere HA on a host Operation timed out The host on which the vSphere HA agent resides failed to become a master host or a slave host after the vSphere HA agent on the host was initialized. vCenter Server waits by default for two minutes after configuration before reporting this error. This error is most often reported when vCenter Server has lost contact with the agent after initiating the configuration process. Check the vSphere HA host state being reported on the host’s Summary tab. If the state is ‘master’ or ‘slave’, the error can be ignored. If it is another host state, there is a problem that needs to be addressed. See vSphere Troubleshooting for more information on HA host states.
Unmounting or removing a datastore The vSphere HA agent on host ‘{hostName}’ failed to quiesce file activity on datastore ‘{dsName}’. To proceed with the operation to unmount or remove a datastore, ensure that the datastore is accessible, the host is reachable and its vSphere HA agent is running. The vSphere HA agent on a host cannot quiesce file activity on a datastore that is to be unmounted or removed. Check that the datastore is accessible, the host is reachable, and its vSphere HA agent is running, you may need to right click on the host and select “Reconfigure for vSphere HA”. Then retry the operation.
Migrating a virtual machine to another host using vMotion or DRS vMotion This virtual machine failed to become vSphere HA Protected and HA may not attempt to restart it after a failure. Not enough resources for a fail over. HA may not vMotion the virtual machine to another host. Upgrade the hardware on the hosts or add more hosts to the cluster.

 

vSphere Fault Tolerance Error Messages
This table lists some of the error messages you can encounter if your host or cluster is not configured appropriately to support FT:

Configuration Errors

Error Message Description and Solution
Host CPU is incompatible with the virtual machine’s requirements. Mismatch detected for these features: CPU does not match FT requires that the hosts for the Primary and Secondary virtual machines use the same type of CPU. Enable FT on a virtual machine registered to a host with a matching CPU model, family, and stepping within the cluster. If no such hosts exist, you must add one. This error also occurs when you attempt to migrate a fault tolerant virtual machine to a different host.
The Fault Tolerance configuration of the entity {entityName} has an issue: Fault Tolerance not supported by host hardware FT is only supported on specific processors and BIOS settings with Hardware Virtualization (HV) enabled. To resolve this issue, use hosts with supported CPU models and BIOS settings.
Virtual Machine ROM is not supported The virtual machine is running VMI kernel and is paravirtualized. VMI is not supported by FT and should be disabled for the virtual machine.
Host {hostName} has some Fault Tolerance issues for virtual machine {vmName}. Refer to the errors list for details To troubleshoot this issue, in the vSphere Client select the failed FT operation in either the Recent Tasks pane or the Tasks & Events tab and click the View details link that appears in the Details column.
The Fault Tolerance configuration of the entity {entityName} has an issue: Check host certificates flag not set for vCenter Server The “check host certificates” box is not checked in the SSL settings for vCenter Server. You must check that box.
The Fault Tolerance configuration of the entity {entityName} has an issue: HA is not enabled on the virtual machine This virtual machine is on a host that is not in a vSphere HA cluster or it has had vSphere HA disabled. Fault Tolerance requires vSphere HA.
The Fault Tolerance configuration of the entity {entityName} has an issue: Host is inactive You must enable FT on an active host. An inactive host is one that is disconnected, in maintenance mode, or in standby mode.
Fault Tolerance has not been licensed on host {hostName}. Fault Tolerance is not licensed in all editions of VMware vSphere. Check the edition you are running and upgrade to an edition that includes Fault Tolerance.
The Fault Tolerance configuration of the entity {entityName} has an issue: No vMotion license or no virtual NIC configured for vMotion Verify that you have correctly configured networking on the host. If you have, then you might need to acquire a vMotion license.
The Fault Tolerance configuration of the entity {entityName} has an issue: No virtual NIC configured for Fault Tolerance logging An FT logging NIC has not been configured.
Host {hostName} does not support virtual machines with Fault Tolerance turned on. This VMware product does not support Fault Tolerance The product you are using is not compatible with Fault Tolerance. To use the product you must turn Fault Tolerance off. This error message primarily appears when vCenter Server is managing a host with an earlier version of ESXi/ESX or if you are using VMware Server.
The Fault Tolerance configuration of the entity {entityName} has an issue: Fault Tolerance not supported by VMware Server 2.0 Upgrade to VMware ESXi/ESX 4.1 or later.
The build or Fault Tolerance feature version on the destination host is different from the current build or Fault Tolerance feature version: {build}. FT feature versions must be the same on current and destination hosts. Choose a compatible host or upgrade incompatible hosts.

 

Virtual Machine Config Error Messages

There are a number of virtual machine configuration issues that can generate error messages. These are two error messages you might see if the virtual machine configuration does not support FT:

  • The Fault Tolerance configuration of the entity {entityName} has an issue: The virtual machine's current configuration does not support Fault Tolerance
  • The Fault Tolerance configuration of the entity {entityName} has an issue: Record and replay functionality not supported by the virtual machine

FT only runs on a virtual machine with a single vCPU. You might encounter these errors when attempting to turn on FT on a multiple vCPU virtual machine:

  • The virtual machine has {numCpu} virtual CPUs and is not supported for reason: Fault Tolerance
  • The Fault Tolerance configuration of the entity {entityName} has an issue: Virtual machine with multiple virtual CPUs

Fault Tolerance does not inter-operate with some vSphere features. If you attempt to turn on FT on a virtual machine using a vSphere feature which FT does not support, you might see one of these error messages. To use FT, you must disable the vSphere feature on the relevant virtual machine or enable FT on a virtual machine not using these features.

  • The Fault Tolerance configuration of the entity {entityName} has an issue: The virtual machine has one or more snapshots
  • The Fault Tolerance configuration of the entity {entityName} has an issue: Template virtual machine

These error messages might occur if your virtual machine has an unsupported device. To enable FT on this virtual machine, remove the unsupported device(s), and turn on FT.

  • The file backing ({backingFilename}) for device Virtual disk is not supported for Fault Tolerance
  • The file backing ({backingFilename}) for device Virtual Floppy is not supported for Fault Tolerance
  • The file backing ({backingFilename}) for device Virtual CDROM is not supported for Fault Tolerance
  • The file backing ({backingFilename}) for device Virtual serial port is not supported for Fault Tolerance
  • The file backing ({backingFilename}) for device Virtual parallel port is not supported for Fault Tolerance
  • The Fault Tolerance configuration of the entity <VM Name> has an issue: The virtual machine has a video device with 3D enabled

Other Virtual Machine Configuration Issues

Error Message Description and Solution
The specified host is not compatible with the Fault Tolerance Secondary VM. Refer to vSphere Troubleshooting for possible causes of this error.
No compatible host for the Secondary VM {vm.name} Refer to vSphere Troubleshooting for possible causes of this error.
The virtual machine’s disk {device} is using the {mode} disk mode which is not supported. The virtual machine has one or more hard disks configured to use Independent mode. Edit the setting of the virtual machine, select each hard disk, and deselect Independent mode. Verify with your system administrator that this is acceptable for the environment.
The unused disk blocks of the virtual machine’s disks have not been scrubbed on the file system. This is needed to support features like Fault Tolerance You have attempted to turn on FT for a powered-on virtual machine which has thick-formatted disks with the property of being lazy-zeroed. FT cannot be enabled on such a virtual machine while it is powered on. Power off the virtual machine, then turn on FT and power the virtual machine back on. This changes the disk format of the virtual machine when it is powered back on. Turning on FT could take some time to complete if the virtual disk is large.
The disk blocks of the virtual machine’s disks have not been fully provisioned on the file system. This is needed to support features like Fault Tolerance You have attempted to turn on FT for a powered-on virtual machine with thin-provisioned disks. FT cannot be enabled on such a virtual machine while it is powered on. Power off the virtual machine, then turn on FT and power the virtual machine back on. This changes the disk format of the virtual machine when it is powered back on. Turning on FT could take some time to complete if the virtual disk is large.

 

Operational Errors
This table lists error messages you might encounter while using fault tolerant virtual machines:

Operational Errors

Error Message Description and Solution
No suitable host can be found to place the Fault Tolerance Secondary VM for virtual machine {vmName} FT requires that the hosts for the Primary and Secondary virtual machines use the same CPU model or family and have the same FT version number or host build number and patch level. Enable FT on a virtual machine registered to a host with a matching CPU model or family within the cluster. If no such hosts exist, you must add one.
The Fault Tolerance Secondary VM was not powered on because the Fault Tolerance Primary VM could not be powered on. vCenter Server will report why the primary could not be powered on. Correct the conditions and then retry the operation.
Operation to power On the Fault Tolerance Secondary VM for {vmName} could not be completed within {timeout} seconds Retry the Secondary virtual machine power on. The timeout can occur because of networking or other transient issues.
vCenter disabled Fault Tolerance on VM {vmName} because the Secondary VM could not be powered on To diagnose why the Secondary virtual machine could not be powered on, seevSphere Troubleshooting.
Resynchronizing Primary and Secondary VMs Fault Tolerance has detected a difference between the Primary and Secondary virtual machines. This can be caused by transient events which occur due to hardware or software differences between the two hosts. FT has automatically started a new Secondary virtual machine, and no action is required. If you see this message frequently, you should alert support to determine if there is an issue.
The Fault Tolerance configuration of the entity {entityName} has an issue: No configuration information for the virtual machine vCenter Server has no information about the configuration of the virtual machine. Determine if it is misconfigured. You can try removing the virtual machine from the inventory and re-registering it.
Cannot change the vSphere HA settings for Fault Tolerance Secondary VM {vmName} The vSphere HA settings for a Secondary virtual machine cannot be changed, because it has the same settings as its Primary virtual machine. Always change only the settings of the Primary virtual machine.
Cannot change the DRS behavior for Fault Tolerance Secondary VM {vmName}. You cannot change the DRS behavior of a Secondary virtual machine. This configuration is inherited from the Primary virtual machine.
Virtual machines in the same Fault Tolerance pair cannot be on the same host You have attempted to migrate a Secondary virtual machine to the same host a Primary virtual machine is on. A Primary virtual machine and its Secondary virtual machine cannot reside on the same host. Select a different destination host for the Secondary virtual machine.
Cannot add a host with virtual machines that have Fault Tolerance turned On to a non-HA enabled cluster FT requires the cluster to be enabled for vSphere HA. Edit your cluster settings and turn on vSphere HA.
Cannot add a host with virtual machines that have Fault Tolerance turned On as a stand-alone host Turn off Fault Tolerance before adding the host as a standalone host to vCenter Server. To turn off FT, right-click each virtual machine on the host and select Turn Off Fault Tolerance. Then you can add the host as a stand-alone host.
Cannot set the HA restart priority to ‘Disabled’ for the Fault Tolerance VM {vmName}. This setting is not allowed for an FT virtual machine. You only see this error if you change the restart priority of an FT virtual machine to Disabled.
Host already has the recommended number of {maxNumFtVms} Fault Tolerance VMs running on it To power on or migrate more FT virtual machines to this host, either move one of the existing Fault Tolerance virtual machines to another host or disable this restriction by setting the vSphere HA advanced optiondas.maxftvmsperhost to 0.
Operations to test Fault Tolerance by terminating the primary VM or secondary VM are not allowed for the Fault Tolerance VM {vmName} at this time, because it is not protected by vSphere HA yet and therefore no action will be taken to recover Fault Tolerance protection for this VM You tried to test failover functionality or attempted the Restart Secondary task on a virtual machine that is not protected by vSphere HA. Do not attempt these tasks until the virtual machine is protected by vSphere HA.

SDK Operational Errors
This table lists error messages you might encounter while using the SDK to perform operations:

SDK Operational Errors

Error Message Description and Solution
This operation is not supported on a Secondary VM of a Fault Tolerant pair An unsupported operation was performed directly on the Secondary virtual machine using the API. FT does not allow direct interaction with the Secondary virtual machine (except for relocating or migrating it to a different host).
The Fault Tolerance configuration of the entity {entityName} has an issue: Secondary VM already exists The Primary virtual machine already has a Secondary virtual machine. Do not attempt to create multiple Secondary virtual machines for the same Primary virtual machine.
The Secondary VM with instanceUuid ‘{instanceUuid}’ has already been enabled An attempt was made to enable FT for a virtual machine on which FT was already enabled. Typically, such an operation would come from an API.
The Secondary VM with instanceUuid ‘{instanceUuid}’ has already been disabled An attempt was made to disable FT for a Secondary VM on which FT was already disabled. Typically, such an operation would come from an API.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>