SQL Server

SQL Server High-Availability: How and Why?

Sushil Naik

Sushil Naik

,

March 15, 2024

Tags

Business Continuity

Disaster Recovery

High Availability

Table of Contents

TABLE OF CONTENTS

Share this blog

This guide provides a detailed walkthrough of the procedures to establish a high-availability (HA) configuration for Microsoft SQL Server, guaranteeing database robustness and minimizing downtime during failures or maintenance operations.

Setup

Node configuration

Node 1: Primary Node is in one availability zone
Node 2: Secondary Node is in another availability zone.

Provisioning Process

The provisioning process for MSSQL High Availability (HA) Availability Groups on Windows Server Failover Clusters (WSFC) involves below key steps.

For Domain Independent Failover Cluster Groups

Node 1 and Node 2 need to permit communication over TCP, ICMP, and UDP protocols.
The IP address entries for both Node 1 and Node 2 should be added to the "hosts" file located at C:\windows\system32\drivers\etc on both machines.

For Active Directory Failover Clusters

Create a sense of trust and establish connectivity between two Windows machines within the same Active Directory environment.

Create Windows Failover Clusters and configure AAG

Installing a Windows Server Failover Cluster provides high availability by allowing multiple servers to work together, ensuring seamless service continuity in case of a node failure.

Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

Add Node 1 and Node 2 to the failover cluster
Create Always-on Availability Group on Node-1 MSSQL Server with FailoverMode=Automatic and CommitMode=Synchronous

HA Architecture

Fail-over Scenarios

Scenario 1: Node 1 Failure

If Node 1 (Primary) goes down, the Node 2 takes over as Primary.
Node 2 becomes the new Primary Instance.
When Node 1 recovers, it synchronizes with the Primary (Node 2) and acts as a Failover Replica. Fast-Start Failover is automatically enabled.

Scenario 2: Node 2 Failure

If Node 2 (Standby) goes down, Node 1 continues as the Primary.
DB Role Monitor will update the status as Node 2 down and SRE team will look into it.
Upon Node 2's recovery, it syncs with Node 1 and continues as a Fail-over Replica.

This applies to EBS failure, Compute Failure and AZ failure.

Patching

High availability patching is available for all nodes in both the Primary and Fail-over Replica configurations, ensuring uninterrupted operations.

The patching process follows this sequence:

Apply the new CU version to the Failover Instance (Node 2).
Execute a switchover, designating Node 2 as the new Primary.
Apply the new CU version to Node 1, which was the previous Primary

Patching Workflow

Follow us

Related articles

These might also interest you

Database Management

Revolutionizing Customers' Open-Source Database Journeys with Tessell's Leading-Edge DBaaS Platform

As businesses increasingly rely on open-source databases to manage their data, the need for scalable, high-performance, and cost-effective solutions has never been more critical. Enter Tessell, a pioneer in Database as a Service (DBaaS), whose bleeding-edge platform is revolutionizing and modernizing the open-source database journeys of its customers.

Patching and Upgrading Oracle Applications Made Easy with Tessell

As enterprises migrate their Oracle applications to the cloud, ensuring high availability (HA), patching, and seamless infrastructure upgrades for the underlying databases is crucial to maintaining business continuity and performance. Tessell’s fully managed DBaaS platform provides robust tools and methodologies to manage these needs effectively.

Announcing Tessell Database Lifecycle Management (DBLM) for Exadata@Azure

Tessell introduces Database Lifecycle Management (DBLM), a Microsoft Marketplace offering that includes a suite of services for improved efficiency and flexibility.

January 24, 2024