Software is a strategic differentiator that can catalyze digital transformation. Organizations are investing in technology, such as modern cloud services, to drive efficiencies and increase the customer experience. To make this a reality, it’s essential that business leaders have a basic understanding of business software and applications work and the opportunities they bring.
A key enabler of digital transformation are distributed computing systems or applications. This article explores the characteristics of these technologies, what they are designed to achieve, how they scale, and how they support cloud native technology.
The growth of distributed systems
Distributed systems were made possible by two advancements in technology during the 1980s. The first were powerful microprocessors that were eventually succeeded by even more powerful multi-core central processing units (CPUs) which can run multiple processes concurrently (single-core processors on run one process at a time). As CPUs became more powerful, equally more powerful applications could be developed.
The second development was high-speed computer networks. Local area networks (LAN) could connect thousands of machines and enable them to communicate with each other. Wide area networks (WAN) scaled that capability to hundreds of millions of machines.
Features of distributed systems
Distributed systems are characterized by a collection of autonomous computing elements that, to users, appear as a single system. The elements are known as “nodes” and can be devices such as a PC or smartphone or a software process. The internet is the perfect example of a distributed system. Composed of large numbers of hardware and software elements it appears as a single system. Data is stored transparently to users, servers are hidden from the eye, and information is served to a user’s browser or inbox in a seamless manner.
This is known as abstraction. It’s a common term in IT. For example, a browser abstracts the complexity of the distributed systems behind the internet. Applications such as Gmail, Hubspot, and Salesforce also abstract away the complexity of what powers them.
Getting back to nodes. Nodes collaborate by exchanging messages between, for example, your browser (node A) and a domain server (node B) which serves up content in response to a request from the browser on your device. This communication is a key characteristic of distributed systems.
Holding distributed systems together
Middleware is a distinct software layer that sits on top of the distributed system or node’s operating system. It’s what allows applications that aren’t compatible to communicate. Middleware also provides additional application services including security and recovery from failure.
Middleware is a lot less prevalent today and has been superseded by Application Programming Interfaces or APIs. An API functions as a gateway that allows applications to securely communicate that might not otherwise be able to because they have incompatible interfaces. It’s another form of abstraction that “abstracts” away the application interface.
Open APIs have become commonplace as developers open their application APIs to other developers outside the organization so that they can leverage their data. Yelp, for example, relies on the Google Maps API to populate its map feature.
Despite the popularity of APIs, middleware hasn’t become extinct – it still provides important security, management, and application coordination functions – it’s simply that APIs have replaced the communication capability.
The design goals of distributed systems
Let’s take a look at the main goals of today’s distributed systems:
- Abstraction: We’ve already discussed this, but distributed systems mask the fact that processes and resources are distributed across multiple computing devices.
- Resource sharing: Many assets and capabilities are shared across applications, such as storage, data, networks, and services. It’s much more cost-effective to have a single, reliable storage facility that can be accessed via a distributed system than purchasing and maintaining a single facility for each application.
- Openness: Open distributed systems are composed of components that can be used by or integrated with other systems. For example, two implementations from different vendors can work together, known as interoperability. While portability ensures that an application developed for system X will work on system Y, without modification. Distributed systems are also extensible, meaning that teams can add or replace components without affecting those that remain in place.
- Scalable: Scalability is needed to meet demand from users. Think Friday night Netflix viewing. Resources can be added dynamically and scaled back when no longer needed.
Let’s take a deeper dive into scalability. Scaling up a system allows you to add more memory, upgrade CPUs, etc. without increasing the amount of machines. Scaling out means extending the distributed system by adding more machines. The two terms are often used interchangeably. If IT refers to scaling applications up, it’s likely they mean scaling out (which cloud environments have made very easy and affordable to do).
Scaling out is critical for apps that experience rapid spikes and require additional resources, but only for a limited time, as in the Netflix example. But other tools can also require scaling out, such as data analytics platforms that need more capacity when algorithms are run.
To build this level of full compute capability into applications is costly. But with the ability to scale-out, powered by a distributed computing system, applications can add resources and then return them to the resource pool when they’re no longer needed (scale-in). This increases efficiency and reduces cost.
But scalability is not without its challenges. Let’s take a look and explain some strategies for overcoming them:
- Asynchronous communication: When apps are scaled out to geo-dispersed machines in the cloud, they can experience delays in response times. Asynchronous communications can help hide this latency from the user. In the past, application communication was a synchronous process, just like a phone call. When the connection drops the exchange stops. Asynchronous communication is like email. A message (email) can be sent to the network and it doesn’t matter if the receiver is online or not. The message will be delivered.
- Partitioning and distribution: These are typically used for large databases. When data is separated into logical groups, each on a separate machine, processes that work on that dataset know where to access it. This is how the internet domain name system (DNS) works. URLs show the pathway to the data rather than systems having to scan the entire internet for it.
- Replication: To increase availability, components are replicated across distributed systems. For example, Netflix’s content is replicated across the world so that users can quickly access it no matter where they are. This also reduces communication latency. Replicas balance the load between components thereby increasing performance during peak times and returning faster results.
- Caching: A form of replication, caching is a copy of a resource. Your browser caches resources for a limited time to help pages load faster.
Caching and replication can impact scalability due to a lack of consistency. How those inconsistencies can be tolerated depends on the app.
3 categories of distributed computing systems: cluster, grid, and cloud
Distributed computing systems fall into three categories:
- Cluster computing: This is a collection or cluster of similar machines that are connected via a high-speed LAN. Each node runs on the same hardware and operating system. Often used for parallel programming when a single compute-intensive program runs in parallel on several machines. Each cluster is made up of a collection of compute nodes that are monitored and managed by one or more master nodes. The master node takes care of the allocation of worker nodes to a certain process and management of request queues. The master also provides the user system interface. In summary, the master manages the cluster and the workers run the program.
- Grid computing: This is composed of nodes that are starkly different in their hardware and network technology and used for certain tasks. Everything is assumed to be different – the hardware, operating system, network, and security policies.
- Cloud computing: The cloud is composed of virtualized resources that are hosted in the cloud providers’ data center. Each customer can build a virtualized infrastructure and take advantage of a variety of cloud services. In a virtual environment all resources appear as a single piece of hardware, but are in fact a piece of software that runs on that hardware. These virtual machines are actually code that wraps around an application mimicking hardware.To the software engineer, it can appear that the company has rented their own private server but are really sharing it with other clients. These dynamic environments enable virtualized resources to be configured for scalability. When more compute resources are needed, they can be easily added.
Cloud computing is such a prevalent distributed computing system that it’s worth taking a quick look at how it’s organized.
The cloud consists of four different layers: hardware, infrastructure, platform, and application.
- The hardware includes processors, power, routers, and cooling systems – all hosted in the cloud provider’s data center and transparent to the end user.
- Infrastructure is the backbone of the cloud and includes virtual machines, storage, and other compute resources (providers of this infrastructure-as-a-service include Amazon S3 or EC2).
- Platform is the layer that allows developers to build and deploy applications in the cloud (this platform-as-a-service layer includes software development tools like Google App engine or Microsoft Azure).
- Finally, the application layer is where cloud-hosted applications reside (think of software-as-a-service solutions like Gmail, YouTube, Google Docs, and Salesforce). Each layer is abstracted from the user.