High Availability

This document covers configuration of a High Availability cluster using the following features:

  • CARP for IP address redundancy
  • XMLRPC for configuration synchronization
  • pfsync for state table synchronization

With this configuration, two units act as an “active/passive” cluster with the primary node working as the master unit and the secondary node in a backup role, taking over as needed if the primary node fails.

High Availability Prerequisites

Before a redundant configuration can be achieved, a few prerequisites must be met.

Assumptions

This guide assumes that:

  • Only two cluster nodes are used.
  • Both cluster nodes are the same model with identical hardware specs.
  • Both units have a factory default configuration and there are no existing settings on these units.

Warning

Do not connect the LAN port of both units into the same LAN switch until some basic settings have been applied to each node, which will be done by the end of this section. Otherwise there will be an IP address conflict and communication with each node individually will not be possible until the conflict is resolved.

Determine the Synchronization Interface

One interface on each node will be dedicated for synchronization tasks. This is typically referred to as the “Sync” interface, and it is used for configuration synchronization and pfsync state synchronization. Any available interface may be used. It isn’t necessary for it to be a high speed port, but it is necessary to choose the same port on both nodes.

Note

Some call this the “CARP” interface but that is incorrect and very misleading. CARP heartbeats happen on each interface with a CARP VIP; CARP traffic and failover actions do not utilize the Sync interface.

Interface Assignments

Interfaces must be assigned in the same order on all nodes exactly. If the interfaces are not aligned, configuration synchronization and other tasks will not behave correctly. The default configuration has all interfaces assigned by default, as seen in the IO Ports section of the unit’s product manual, which makes a good starting point for this guide. If any adjustments have been made to the interface assignments, they must be replicated identically on both nodes.

IP Address Requirements

A High Availability cluster needs three IP addresses in each subnet along with a separate unused subnet for the Sync interface. For WANs, this means that a /29 subnet or larger is required for an optimal configuration. One IP address is used by each node, plus a shared CARP VIP address for failover. The synchronization interface only requires one IP address per node.

The IP addresses used in this guide are shown in the following tables, substitute the real IP addresses as needed.

WAN IP Address Assignments
IP Address Usage
198.51.100.200/24 CARP shared IP address
198.51.100.201/24 Primary node WAN IP address
198.51.100.202/24 Secondary node WAN IP address
LAN IP Address Assignments
IP Address Usage
192.168.1.1/24 CARP shared IP address
192.168.1.2/24 Primary node LAN IP address
192.168.1.3/24 Secondary node LAN IP address
Sync IP Address Assignments
IP Address Usage
172.16.1.2/24 Primary node Sync IP address
172.16.1.3/24 Secondary node Sync IP address

Single address CARP

It is technically possible to configure an interface with a CARP VIP as the only IP address in a given subnet, but it is not generally recommended. When used on a WAN, this type of configuration will only allow communication from the primary node to the WAN, which greatly complicates tasks such as updates, package installations, gateway monitoring, or anything that requires external connectivity from the secondary node. It can be a better fit for an internal interface, however internal interfaces do not typically suffer from the same IP address limitations as a WAN, so it is still preferable to configure IP addresses on all nodes. Such a configuration is not covered in this guide.

Determine CARP VHID Availability

CARP can interfere with VRRP, HSRP, or other systems using CARP if conflicting identifiers are used. In order to ensure that a segment is clear of conflicting traffic, perform a packet capture on each interface looking for CARP/VRRP traffic. A given VHID must be unique on each layer 2, so each interface must be checked separately. The same VHID may be used on different segments so long as they are separate broadcast domains.

If any CARP or VRRP traffic is shown, note the VHID/VRID and avoid using that identifier when configuring the CARP VIP VHIDs later.

This guide assumes there is no other potentially conflicting traffic present.

Setup Requirements

Using the Setup Wizard, or manually afterward, configure each firewall with a unique hostname and non-conflicting static IP addresses.

For example, one node could be “firewall-a.example.com” and the other “firewall- b.example.com”, or a more personalized pair of names. Avoid naming the nodes “master” and “backup” since those are states and not roles, instead they could be named “primary” and “secondary”.

For IP addresses, the factory default LAN address is 192.168.1.1. In a High Availability environment, that address would be a CARP VIP instead. Using that subnet, move each node to its own address there, such as 192.168.1.2 for the primary and 192.168.1.3 for the secondary. This layout is shown in LAN IP Address Assignments

Once each node has a unique LAN IP address, then both nodes may be plugged into the same LAN switch.

Both nodes must have the GUI running on the same port and protocol. This guide assumes both use HTTPS on port 443.

The admin account cannot be disabled and both nodes must have the same admin account password.

Both nodes must have static IP addresses in the same subnet and have a proper gateway configured on the WAN interface settings.

Both nodes must have DNS configured properly under System > General Setup.

Visit Services > DNS Resolver. Review the settings and even if nothing has been changed, click Save once to ensure the default values are respected.

Switch / Layer 2 Configuration

CARP Concerns

CARP heartbeats utilize multicast and may require special handling on the switches involved with the cluster. Some switches filter, rate limit, or otherwise interfere with multicast in ways that can cause CARP to fail. Also, some switches employ port security methods which may not work properly with CARP.

At a minimum, the switch must:

  • Allow Multicast traffic to be sent and received without interference on ports using CARP VIPs.
  • Allow traffic to be sent and received using multiple MAC addresses.
  • Allow the CARP VIP MAC address to move between ports.

Nearly all problems with CARP failing to properly reflect the expected status are failures of the switch or other layer 2 issues, so be sure the switches are properly configured before continuing.

Port Configuration

Each node must be connected to a common, but separate, layer 2 on each interface. This means that WAN, LAN, and other interfaces must be connected to separate switches or VLANS with each node being connected to the same segments on each.

For example, the WAN ports of each node must connect to the same WAN switch, which then connects to the WAN CPE/Modem/Upstream link. The LAN ports would all connect to the same LAN switch, and so on. The Sync interface may be connected directly between the two nodes without a switch.

Testing High Availability

With all of the configuration complete, the time has come for testing. Tests for each aspect of the system are listed below. If any of the tests fails, review the configuration and consult Troubleshooting High Availability for assistance.

Verify General Functionality

Setup a client on the LAN and ensure that it receives a DHCP IP address and that it shows the LAN CARP VIP as its gateway and DNS server. Verify that the client can reach the Internet and otherwise function as expected.

Verify XMLRPC Sync is working

XMLRPC Configuration Synchronization can be tested several ways. The easiest method is to make a change to any supported area on the primary, such as a firewall rule, and then see if the change is reflected on the secondary after a few moments.

The manual method for forcing a synchronization task to test XMLRPC is to visit Status > Filter Reload on the primary node and click Force Config Sync. The status will change briefly and then if everything is working properly, a message will be displayed indicating the sync completed successfully.

Verify CARP is working

Visit Status > CARP on both nodes to check if CARP is functional. The primary node will display “MASTER” for all CARP VIPs and the secondary will display “BACKUP” for all CARP VIPs. If the status screen indicates that CARP is disabled, press the Enable CARP button.

Verify State Synchronization is working

The Status > CARP page lists pfsync nodes which give an indication of the state synchronization status. The values may not always match identically on both nodes, but they will be close. If the two are very different, it can indicate a problem with state synchronization. If they are identical or nearly identical, then state synchronization is working.

Testing Failover

A manual failover test may be initiated in one of four ways:

  1. Click Temporarily Disable CARP on Status > CARP on the primary node. This will disable CARP temporarily, and if the primary node is rebooted it will turn back on. Click Enable CARP to turn it back on.
  2. Click Enter Persistent CARP Maintenance Mode on Status > CARP on the primary node. This will disable CARP persistently, even if the primary node is rebooted. To exit maintenance mode, click Leave Persistent CARP Maintenance Mode to enable CARP once again.
  3. Unplug a network cable from an interface with a CARP VIP present, such as WAN or LAN. This will trigger a failover event. Plug the cable back in to recover.
  4. Shut down or reboot the primary node.

During any of the above tests, visit Status > CARP on the secondary to confirm that the CARP VIPs have taken over and show a “MASTER” status.

Before, during, and after triggering a failover, test connections from a client on the LAN through to the Internet to ensure connectivity works at each step. Downloading a file, streaming audio, or streaming video will most likely continue uninterrupted. VoIP-based phone calls may have a slight disruption as they are not buffered like the others.

Also have a client attempt to obtain an IP address by DHCP while running from the secondary.

If VPNs or other services have been configured, check those during the test as well to ensure the VPN established on the secondary node and continues to pass traffic.

Once the primary node has returned to “MASTER” status, ensure everything continues to work.

Troubleshooting High Availability

In the event that any of the testing fails, there are a few common things to check.

Review the Configuration

Before digging too deep into the technical details below, first review the configuration and ensure all steps were followed accurately.

Troubleshooting CARP

Check Interface Status

If an interface shows “INIT” for the CARP state, as shown in CARP Status on Primary with Disconnected Interface, most commonly this indicates that the interface upon which this VIP resides is not connected to anything. If there is no link to a switch or another port, the interface is down and the VIP cannot be fully initialized. If the NIC is plugged in and appears to have a link when this occurs, edit, save, and apply changes for the VIP in question to reconfigure it.

../../_images/ha-carp-init.png

CARP Status on Primary with Disconnected Interface

Conflicting VHIDs

The VHID determines the virtual MAC address used by that CARP IP. The input validation in pfSense will not permit using conflicting VHIDs on a single pair of systems, however if there are multiple systems on the same broadcast domain running CARP, it is possible to create a conflict. VRRP also uses the same virtual MAC address scheme, so a VRRP IP using the same VRID as a CARP IP VHID will also generate the same MAC address conflict.

When using CARP on the WAN interface, this also means VRRP or CARP used by the ISP can also conflict. Be sure to use VHIDs that are not in use by the ISP on that broadcast domain.

In addition to creating a MAC conflict which can interfere with traffic, it can also interfere with the CARP VIP status.

Incorrect Subnet Mask

The subnet mask for a CARP VIP must match the subnet mask on the Interface IP address for the same subnet. For example, if an interface IP address is 192.168.1.2/24, the CARP VIP must also be 192.168.1.1/24.

Switch/Layer 2 Issues

Typically a switch or layer 2 issue manifests itself as both units showing “MASTER” status for one or more CARP VIPs. If this happens, check the following items:

  1. Ensure that the interfaces on both boxes (The WANs, LANs, etc, etc) are connected to the proper switch/VLAN/layer 2. For example, ensure that the LAN port on both units is connected to the same switch/VLAN.
  2. Verify that the two nodes can reach each other (via ICMP echo, for example) on each segment. Firewall rules may need to be added to WAN to accommodate this test.
  3. If the units are plugged into separate switches, ensure that the switches are properly trunking and passing broadcast/multicast traffic.
  4. If the switch on the back of a modem/CPE is being used, try a real switch instead. These built-in switches often do not properly handle CARP traffic. Often plugging the firewalls into a proper switch and then uplinking to the CPE will eliminate problems.
  5. Disable IGMP snooping or other multicast limiting and inspecting features. If they are already off, try enabling the feature and disabling it again.