SQL Server

SQL Server High-Availability: How and Why?

Sushil Naik
Sushil Naik
,
March 15, 2024
Share this blog
arrow icon
SQL Server High-Availability: How and Why?

This guide provides a detailed walkthrough of the procedures to establish a high-availability (HA) configuration for Microsoft SQL Server, guaranteeing database robustness and minimizing downtime during failures or maintenance operations.

Setup

Node configuration

  • Node 1: Primary Node is in one availability zone
  • Node 2: Secondary Node is in another availability zone.

Provisioning Process

The provisioning process for MSSQL High Availability (HA) Availability Groups on Windows Server Failover Clusters (WSFC) involves below key steps.

For Domain Independent Failover Cluster Groups

  • Node 1 and Node 2 need to permit communication over TCP, ICMP, and UDP protocols.
  • The IP address entries for both Node 1 and Node 2 should be added to the "hosts" file located at C:\windows\system32\drivers\etc on both machines.

For Active Directory Failover Clusters

Create a sense of trust and establish connectivity between two Windows machines within the same Active Directory environment.

Create Windows Failover Clusters and configure AAG

  • Installing a Windows Server Failover Cluster provides high availability by allowing multiple servers to work together, ensuring seamless service continuity in case of a node failure.

Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

  • Add Node 1 and Node 2 to the failover cluster
  • Create Always-on Availability Group on Node-1 MSSQL Server with FailoverMode=Automatic and CommitMode=Synchronous
HA Architecture

Fail-over Scenarios

Scenario 1: Node 1 Failure

  • If Node 1 (Primary) goes down, the Node 2 takes over as Primary.
  • Node 2 becomes the new Primary Instance.
  • When Node 1 recovers, it synchronizes with the Primary (Node 2) and acts as a Failover Replica. Fast-Start Failover is automatically enabled.

Scenario 2: Node 2 Failure

  • If Node 2 (Standby) goes down, Node 1 continues as the Primary.
  • DB Role Monitor will update the status as Node 2 down and SRE team will look into it.
  • Upon Node 2's recovery, it syncs with Node 1 and continues as a Fail-over Replica.

This applies to EBS failure, Compute Failure and AZ failure.

Patching

High availability patching is available for all nodes in both the Primary and Fail-over Replica configurations, ensuring uninterrupted operations.

The patching process follows this sequence:

  1. Apply the new CU version to the Failover Instance (Node 2).
  2. Execute a switchover, designating Node 2 as the new Primary.
  3. Apply the new CU version to Node 1, which was the previous Primary
Patching Workflow
Follow us
Youtube Button