Cluster Compting


A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. The major objective in the cluster is utilizing a group of processing nodes son as to complete the assigned job in a minimum amount of time by working cooperatively. The main and important strategy to achieve such objective is by transferring the extra loads from busy nodes to idle nodes. The seminar will contain the concepts of cluster computing and the principles involved in it.



Parallel computing has seen many changes since the days of the highly expensive and proprietary super computers. Changes and improvements in performance have also been seen in the area of mainframe computing for many environments. But these compute environments may not be the most cost effective and flexible solution for a problem. Over the past decade, cluster technologies have been developed that allow multiple low cost computers to work in a coordinated fashion to process applications. The economics, performance and flexibility of compute clusters makes cluster computing an attractive alternative to centralized computing models and the attendant to cost, inflexibility, and scalability issues inherent to these models. Many enterprises are now looking at clusters of high-performance, low cost computers to provide increased application performance, high availability, and ease of scaling within the data center. Interest in and deployment of computer clusters has largely been driven by the increase in the performance of off-the-shelf commodity computers, high-speed, low-latency network switches and the maturity of the software components. Application performance continues to be of significant concern for various entities including governments, military, education, scientific and now enterprise organizations. This document provides a review of cluster computing, the various types of clusters and their associated applications. This document is a high level informational document; it does not provide details about various cluster implementations and applications.



Cluster computing is best characterized as the integration of a number of off-the-shelf commodity computers and resources integrated through hardware, networks, and software to behave as a single computer. Initially, the terms cluster computing and high performance computing was viewed as one and the same. However, the technologies available today have redefined the term cluster computing to extend beyond parallel computing to incorporate load-balancing clusters and high availability clusters. Clusters may also be deployed to address load balancing, parallel processing, systems management, and scalability. Today, clusters are made up of commodity computers usually restricted to a single switch or group of interconnected switches operating at Layer 2 and within a single virtual local-area network (VLAN). Each compute node (computer) may have different characteristics such as single processor or symmetric multiprocessor design, and access to various types of storage devices. The underlying network is a dedicated network made up of high-speed, low-latency switches that may be of a single switch or a hierarchy of multiple switches. A growing range of possibilities exists for a cluster interconnection technology. Different variables will determine the network hardware for the cluster. Price-perport, bandwidth, latency, and throughput are key variables. The choice of network technology depends on a number of factors, including price, performance, and compatibility with other cluster hardware and system software as well as communication characteristics of the applications that will use the cluster. Clusters are not commodities in themselves, although they may be based on commodity hardware. A number of decisions need to be made (for example, what type of hardware the nodes run on, which interconnect to use, and which type of switching architecture to build on) before assembling a cluster range. Each decision will affect the others, and some will probably be dictated by the intended use of the cluster. Selecting the right cluster elements involves an understanding of the application and the necessary resources that include, but are not limited to, storage, throughput, latency, and number of nodes. When considering a cluster implementation, there are some basic questions that can help determine the cluster attributes such that technology options can be evaluated:

  1. Will the application be primarily processing a single dataset?
  2. Will the application be passing data around or will it generate real-time


  1. Is the application 32- or 64-bit?

The answers to these questions will influence the type of CPU, memory architecture, storage, cluster interconnect, and cluster network design. Cluster applications are often CPU-bound so that interconnect and storage bandwidth are not limiting factors, although this is not always the case. 


The main benefits of clusters are scalability, availability, and performance. For scalability, a cluster uses the combined processing power of compute nodes to run cluster-enabled applications such as a parallel database server at a higher performance than a single machine can provide. Scaling the cluster’s processing power is achieved by simply adding additional nodes to the cluster. Availability within the cluster is assured as nodes within the cluster provide backup to each other in the event of a failure. In high-availability clusters, if a node is taken out of service or fails, the load is transferred to another node (or nodes) within the cluster. To the user, this operation is transparent as the applications and data running are also available on the failover nodes. An additional benefit comes with the existence of a single system image and the ease of manageability of the cluster. From the users perspective the users sees an application resource as the provider of services and applications. The user does not know or care if this resource is a single server, a cluster, or even which node within the cluster is providing services. These benefits map to needs of today’s enterprise business, education, military and scientific community infrastructures. In summary, clusters provide:

  • Scalable capacity for compute, data, and transaction intensive applications, including support of mixed workloads
  • Horizontal and vertical scalability without downtime
  • Ability to handle unexpected peaks in workload
  • Central system management of a single systems image
  • 24 x 7 availability. 


There are several types of clusters, each with specific design goals and functionality. These clusters range from distributed or parallel clusters for computation intensive or data intensive applications that are used for protein, seismic, or nuclear modeling to simple load-balanced clusters.

1.High Availability or Failover Clusters

These clusters are designed to provide uninterrupted availability of data or services (typically web services) to the end-user community. The purpose of these clusters is to ensure that a single instance of an application is only ever running on one cluster member at a time but if and when that cluster member is no longer available, the application will failover to another cluster member. With a high-availability cluster, nodes can be taken out-of-service for maintenance or repairs. Additionally, if a node fails, the service can be restored without affecting the availability of the services provided by the cluster.  While the application will still be available, there will be a performance drop due to the missing node. High-availability clusters implementations are best for mission-critical applications or databases, mail, file and print, web, or application servers.

Figure is Shown below:-


 Figure:. (Failover Clusters)

Unlike distributed or parallel processing clusters, high-availability clusters seamlessly and transparently integrate existing standalone, non-cluster aware applications together into a single virtual machine necessary to allow the network to effortlessly grow to meet increased business demands.

2.Cluster-Aware and Cluster-Unaware Applications

Cluster-aware applications are designed specifically for use in clustered environment. They know about the existence of other nodes and are able to communicate with them. Clustered database is one example of such application. Instances of clustered database run in different nodes and have to notify other instances if they need to lock or modify some data. Cluster-unaware applications do not know if they are running in a cluster or on a single node. The existence of a cluster is completely transparent for such applications, and some additional software is usually needed to set up a cluster. A web server is a typical cluster-unaware application. All servers in the cluster have the same content, and the client does not care from which server the server provides the requested content.

3.Load Balancing Cluster

This type of cluster distributes incoming requests for resources or content among multiple nodes running the same programs or having the same content. Every node in the cluster is able to handle requests for the same content or application. If a node fails, requests are redistributed between the remaining available nodes. This type of distribution is typically seen in a web-hosting environment.


Fig. (Load Balancing Cluster)

Both the high availability and load-balancing cluster technologies can be combined to increase the reliability, availability, and scalability of application and data resources that are widely deployed for web, mail, news, or FTP services.


Parallel/Distributed Processing Clusters

Traditionally, parallel processing was performed by multiple processors in a specially designed parallel computer. These are systems in which multiple processors share a single memory and bus interface within a single computer. With the advent of high speed, low-latency switching technology, computers can be interconnected to form a parallel-processing cluster. These types of cluster increase availability, performance, and scalability for applications, particularly computationally or data intensive tasks. A parallel cluster is a system that uses a number of nodes to simultaneously solve a

specific computational or data-mining task. Unlike the load balancing or highavailability clusters that distributes requests/tasks to nodes where a node processes the entire request, a parallel environment will divide the request into multiple sub-tasks that are distributed to multiple nodes within the cluster for processing. Parallel clusters are typically used for CPU-intensive analytical applications, such as mathematical computation, scientific analysis (weather forecasting, seismic analysis, etc.), and financial data analysis. One of the more common cluster operating systems is the Beowulf class of clusters. A Beowulf cluster can be defined as a number of systems whose collective processing capabilities are simultaneously applied to a specific technical, scientific, or business application. Each individual computer is referred to as a “node” and each node communicates with other nodes within a cluster across standard Ethernet technologies (10/100 Mbps, GbE, or 10GbE). Other highspeed interconnects such as Myrinet, Infiniband, or Quadrics may also be used.



The basic building blocks of clusters are broken down into multiple categories: the cluster nodes, cluster operating system, network switching hardware and the node/switch. Significant advances have been accomplished over the past five years to improve the performance of both the compute nodes as well as the underlying switching infrastructure.


Fig. (Cluster Components)

Application : It includes all the various applications that are going on for a particular group. These applications run in parallel. These includes various query running on different nodes of the cluster. This can be said as the input part of the cluster component.

Middleware: These are software packages which interacts the user with the operating system for the cluster computing. In other words we can say that these are the layers of software between applications and operating system. Middleware provides various services required by an application to function correctly. The software that are used as middleware are:



_ Image based Installation.

_ Supported by Red Hat 9.0 and Mandrake 9.0.

_ Processors supported: x86, Itanium (in beta).

_ Interconnects: Ethernet, Myrinet.

_ Diskless support in development.

_ Opteron support in development.

_ High-availability support in alpha testing.



_ Commercial distribution.

_ Single system image design.

_ Processors: x86 and Opteron.

_ Interconnects: Ethernet and Infiniband.

_ MPI and PVM.

_ Diskful and diskless support.



_ Processors: x86, Opteron, Itanium.

_ Interconnects: Ethernet and Myrinet.

_ Compute node management via Red Hat’s kickstart mechanism.

_ Diskfull only.

_ Cluster on CD.

Operating System: Clusters can be supported by various operating systems which includes Windows, Linux.etc.

Interconnect: Interconnection between the various nodes of the cluster system can be done using 10GbE, Myrinet etc. In case of small cluster system these and be connected with the help of simple switches.

Nodes: Nodes of the cluster system implies about the different computers that are connected. All of these processors can be of intels or AMD 64 bit.

A cluster Computer and Its Architecture



A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.
A computer node can be a single or multiprocessor system (PCs, workstations, or SMPs) with memory, I O facilities, and an operating system. A cluster generally refers to two or more computers (nodes) connected together. The nodes can exist in a single cabinet or be physically separated and connected via a LAN. An inter- connected (LAN-based) cluster of computers can appear as a single system to users and applications. Such a system can provide a cost-evective way to gain features and benefits (fast and reliable services) that have historically been found only on more expensive proprietary shared memory systems. The typical architecture of a cluster is shown in Figure above
The following are some prominent components of cluster computers:
Multiple High Performance Computers (PCs, Workstations, or SMPs)
State-of-the-art Operating Systems (Layered or Micro-kernel based)
High Performance Networks Switches (such as Gigabit Ethernet and Myrinet)
Network Interface Cards (NICs)
Fast Communication Protocols and Services (such as Active and Fast Messages)
Cluster Middleware (Single System Image (SSI) and System Availability Infrastructure)
Hardware (such as Digital (DEC) Memory Channel, hardware DSM, and SMP techniques)
Operating System Kernel or Gluing Layer (such as Solaris MC and GLUn ix)
Applications and Subsystems
Applications (such as system management tools and electronic forms)
Run-time Systems (such as software DSM and parallel file-system)
Resource Management and Scheduling software (such as LSF (Load Sharing Facility) and CODINE (COmputing in DIstributed Net-worked Environments))
Parallel Programming Environments and Tools (such as compilers, PVM (Parallel Virtual Machine), and MPI (Message Passing Interface))
Parallel or Distributed
The network interface hardware acts as a communication processor and is responsible for transmitting and receiving packets of data between cluster nodes via a network switch.
Communication software offers a means of fast and reliable data communication among cluster nodes and to the outside world. Often, clusters with a special network switch like Myrinet use communication protocols such as active messages for fast communication among its nodes. They potentially bypass the operating system and thus remove the critical communication overheads providing direct user-level access to the network interface.
The cluster nodes can work collectively as an integrated computing resource, or they can operate as individual computers. The cluster middleware is responsible for offering an illusion of a united system image (single system image) and availability out of a collection on independent but interconnected computers.
Programming environments can offer portable, efficient, and easy-to-use tools for development of applications. They include message passing libraries, debuggers, and profilers. It should not be forgotten that clusters could be used for the execution of sequential or parallel applications.




Data throughput begins with a calculation of a theoretical maximum throughput and concludes with effective throughput. The effective throughput available between nodes will always be less than the theoretical maximum. Throughput for cluster nodes is based on many factors, including the following:

  • Total number of nodes running
  • Switch architectures
  • Forwarding methodologies
  • Queuing methodologies
  • Buffering depth and allocations
  • Noise and errors on the cable plant

Slow Start

In the original implementation of TCP, as soon as a connection was established between two devices, they could each send segments as fast as they liked as long as there was room in the other device’s receive window. In a busy network, the sudden appearance of a large amount of new traffic could exacerbate any existing congestion. To alleviate this problem, modern TCP devices are restrained in the rate at which they initially send segments. Each sender is at first restricted to sending only an amount of data equal to one “full-sized”segment that is equal to the MSS value for the connection. Each time an acknowledgment is received, the amount of data the device can send is increased by the size of another full-sized segment. Thus, the device “starts slow” in terms of how much data it can send, with the amount it sends increasing until either the full window size is reached or congestion is detected on the link.

Congestion Avoidance

When potential congestion is detected on a TCP link, a device responds by throttling back the rate at which it sends segments. A special algorithm is used that allows the device to drop the rate at which segments are sent quickly when congestion occurs. The device then uses the Slow Start algorithm, described above, to gradually increase the transmission rate back up again to try to maximize throughput without congestion occurring again.

In the event of packet drops, TCP retransmission algorithms will engage. Retransmission timeouts can reach delays of up to 200 milliseconds, thereby

significantly impacting throughput. 


Few important cluster applications are:

_ Google Search Engine.

_ Petroleum Reservoir Simulation.

_ Image Rendering

Google Search Engine

Internet search engines enable Internet users to search for information on the Internet by entering specific keywords. A widely used search engine, Google uses cluster computing to meet the huge quantity of worldwide search requests that comprise of a peak of thousands of queries per second. A single Google query needs to use at least tens of billions of processing cycles and access a few hundred megabytes of data in order to return satisfactory search results. Google uses cluster computing as its solution to the high demand of system resources since clusters have better price-performance ratios than alternative highperformance computing platforms, and also use less electrical power. Google focuses on 2 important design factors: reliability and request throughput. Google is able to achieve reliability at the software level so that a reliable computing infrastructure can be constructed on clusters of 15,000 commodity PCs distributed worldwide. The services for Google are also replicated across multiple machines in the clusters to provide the necessary availability. Google maximizes overall request throughput by performing parallel execution of individual search requests. This means that more search requests can be completed within a specific time interval.


Fig. (Google query-serving architecture)

6.5 Image Rendering

The Scientific Computing and Imaging (SCI) Institute at University of Utah has explored cluster-based scientific visualization using a 32-node visualization cluster composed of commodity hardware components connected with a highspeed network. The OpenGL scientific visualization tool Simian has been modified to create a cluster-aware version of Simian that supports parallelization by making explicit use of remote cluster nodes through a message-passing interface (MPI). Simian is able to generate 3D images for fire-spread simulations that model scenarios such as when a missile located within a pool of jet fuel catches fire and explodes. Using image rendering for fire-spread simulations enables researchers to have a better visualization of the destructive effects. Normally, Simian uses a swapping mechanism to manage datasets that are too large to load into the available texture memory, resulting in low performance and interactivity. For the cluster-aware Simian, large datasets are divided into sub-volumes that can be distributed across multiple cluster nodes, thus achieving the interactive performance. This “divide-and-conquer” technique first decomposes the dataset into sub-volumes before distributing the sub-volumes to multiple remote cluster nodes. Each node is then responsible for rendering its sub-volume using the locally available graphics hardware. The individual results are finally combined using a binary-swap compositing algorithm to generate the final image. .Figure shows the visualization of two fire-spread datasets simulating a heptane pool fire, generated by the cluster-aware version of Simian using 8 cluster nodes. The top row of Figure shows two views (side and top views) of the h300_0075 dataset, while the bottom row shows the h300_0130 dataset.






 High-performance cluster computing is enabling a new class of computationally intensive applications that are solving problems that were previously cost prohibitive for many enterprises. The use of commodity computers collaborating to resolve highly complex, computationally intensive tasks has broad application across several industry verticals such as chemistry or biology, quantum physics, petroleum exploration, crash test simulation, CG rendering, and financial risk analysis. However, cluster computing pushes the limits of server architectures, computing, and network

performance. Due to the economics of cluster computing and the flexibility and high performance offered, cluster computing has made its way into the mainstream enterprise data centers using clusters of various sizes. As clusters become more popular and more pervasive, careful consideration of the application requirements and what that translates to in terms of network characteristics becomes critical to the design and delivery of an optimal and reliable performing solution. Knowledge of how the application uses the cluster nodes is impacted by the underlying network is critically important. As critical as the selection of the cluster nodes and operating system, so too are the selection of the node interconnects and underlying cluster network switching technologies. A scalable and modular networking solution is critical, not only to provide incremental connectivity but also to provide incremental bandwidth options as the cluster grows. The ability to use advanced technologies within the same networking platform, such as 10 Gigabit Ethernet, provides new connectivity options, increases bandwidth, whilst providing investment protection. The technologies associated with cluster computing, including host protocol stack processing and interconnect technologies, are rapidly evolving to meet the demands of current, new, and emerging applications. Much progress has been made in the development of low-latency switches, protocols, and standards that efficiently and effectively use network hardware components. 


Amy Apona, Rajkumar Buyyab, Hai Jind, and Jens Mache f

  • Baker, A. Apon, R. Buyya, H. Jin, “Cluster Computing and Applications”, Encyclopedia of Computer Science and Technology, Vol.45, Marcel Dekker, Aug. 2006.
  • Buyya (ed.), High Performance Cluster Computing: Systems and Architectures, Prentice Hall, 2007
  • Cluster Computing White Paper: Status – Final Release-Version 2.0

Date – 28th Dec 2000

Download PPT



Leave a Reply

Your email address will not be published. Required fields are marked *