A Crash Course in Clustering

[ Go to May 1997 Table of Contents ]

NT Enterprise
NT Feature
A Crash Course in Clustering

-- by Art Brieva and John D. Ruley

It's a good bet that you know a thing or two about clustering basics (when one server fails, another takes over and you keep your job). However, clustering has its own physical requirements and lingo to describe those elements.

A cluster includes two or more servers known as nodes. The servers share one or more common SCSI disk array(s) called quorum disks. A SCSI controller is installed into each node, which in turn is connected to a shared disk subsystem. Each node has its own disk drive or drives; the drive is unshared and used to store system files and paging files.

Additionally, each node in a cluster has two network interface cards. The first card connects the node to the LAN, while the second is used for the private cluster interconnect-a dedicated high-speed connection between the clustered servers. The private interconnect guarantees clear communications among nodes within the cluster. This connection doesn't require any fancy Ethernet hubs or proprietary network connections.

The combination of all resources within an NT cluster provides ultimate availability and scalability. Wolfpack increases system availability by allowing resources, such as disks and applications, to move or fail over to another node within the cluster. For example, assume Internet Information Server (IIS) is running on a node within the cluster. If the IIS service stopped running, Wolfpack's monitors would sense the resource failure and fail over IIS to another node within the cluster.

Cluster software

In principle, fail-over clusters support just about any server-side application by launching a new instance of the app on the backup server when the primary system fails. But this approach is limited. For instance, many client-side applications are unprepared to deal with the inevitable delay in response while the new server instance is started.

Both Oracle Workgroup Server and Microsoft SQL Server 6.5 address this by providing an automatic reconnect feature, making them ideal for a two-node, fail-over cluster. However, neither application can scale across clusters. That is, the backup server doesn't actually do anything unless the primary server fails.

To scale a database or another app across a cluster, the app must be made cluster-aware. Each instance of the application monitors all other instances. Queries (or other operations) are dispatched so that the processing load is balanced across all servers in the cluster, and operations on any shared devices (typically a shared-SCSI RAID drive) are coordinated so that database integrity is maintained. Oracle's Parallel Server is such an app, and Microsoft plans to deliver a cluster-aware edition of SQL Server and other BackOffice apps after Wolfpack is introduced.

Windows Magazine, May 1997, page NT34.

[ Go to May 1997 Table of Contents ]