Facebook’s Tech Stack & Intuitive Database Architecture In 2023 Explained

Julian Wallis
15 min read
Facebook's tech stack is a dynamic blend of tools, from LAMP Stack to HHVM, Apache Cassandra, and GraphQL, powering the company's dedication to delivering a seamless and unforgettable user experience.
Facebook tech stack

Facebook remains committed to innovation in the face of this ever-evolving competitive tech landscape, constantly introducing fresh solutions to maintain a smooth and captivating experience for its extensive user community. As Facebook’s user numbers continue to soar, we can expect the company to continually remain at the forefront of technological innovation, pushing the boundaries of what’s possible in the world of social media.

With over 3 billion monthly active users, Facebook stands as one of the world’s most influential and largest technology companies. To support its massive social network, Facebook has developed a sophisticated and potent technology stack. This stack encompasses various components, each crucial in its own right.

In this exploration, we’ll break down the tech that drives Facebook’s global reach, from how it connects people to how it handles data and user experiences. We’re going behind the scenes to uncover the tools and platforms that make our digital world tick.

So without any further ado, these are the technologies that have played a pivotal role in Facebook’s journey, contributing to its success as a global social media giant.

Facebook’s Tech Stack Explored

Frontend Development

ReactJS: Facebook employs the ReactJS JavaScript library for building user interfaces. ReactJS facilitates the creation of intricate UI components with reusable code, simplifying the development and maintenance of large-scale web applications.

Facebook’s adoption of ReactJS has significantly enhanced the platform’s front-end development. With its component-based architecture, ReactJS enables the creation of interactive and responsive web interfaces. This approach allows developers to efficiently manage and update different parts of a web page, providing users with a seamless experience.

Optimising JavaScript Performance through Code Splitting

When it comes to JavaScript-based single-page applications, one of the major concerns is the size of the code, as it directly impacts the speed at which a page loads. Creating a client-side React app for Facebook meant tackling this issue head-on. To address it, Facebook introduced a range of new APIs that align with their philosophy of loading “as little as possible, as early as possible.”

Gradual Code Download: Serving Only What’s Necessary, When It’s Needed

When users are patiently waiting for a page to load, Facebook’s primary objective is to offer immediate feedback by displaying UI “skeletons” that provide a preview of the final page’s appearance. These skeletons require minimal resources. However, rendering them early is a challenge when the code is bundled into a single package. 

To overcome this, Facebook implemented code-splitting into bundles based on the sequence in which the page elements should appear. But they were also very cautious not to hinder performance in the process. This leads us to their innovative approach to JavaScript Loading Tiers – where they divided the JavaScript required for the initial load into three distinct tiers, all managed through a declarative, statically analysable API.

React Native: For developing native mobile apps, Facebook’s tech stack harnesses React Native using ReactJS. This approach enables developers to write code once and deploy it across multiple platforms, including iOS and Android, without managing separate codebases.

Atomic CSS: Atomic CSS is a methodology employed by Facebook to streamline and optimise its styling practices. It breaks down styles into small, reusable classes, making it easier to manage and maintain the platform’s extensive and ever-evolving design. This approach ensures consistency in design across Facebook’s web and mobile applications.

GraphQL: Developed by Facebook, GraphQL serves as a query language and runtime for APIs. It offers an efficient and flexible alternative to traditional REST APIs, enhancing performance and resource efficiency.

Backend Development

Hack Language: Facebook’s creation, the Hack Language, serves as the programming language for the HipHop Virtual Machine (HHVM) – more on this later. It’s a dynamically typed language designed to be both easy to learn and highly performant.

Cassandra Database: In Facebook’s tech stack, the NoSQL Cassandra database takes centre stage, designed for high scalability and availability. It serves as the backbone for storing extensive volumes of data, supporting Facebook’s ever-expanding user base. It excels in handling high traffic levels and ensuring rapid data access. We’ll get more technical into the inner workings of Cassandra Apache later in the article once we get into the LAMP stack, spoiler alert!

Facebook’s Database Infrastructure

Facebook’s Datacenters: Facebook operates its data centres to maximise control and efficiency, reducing the company’s carbon footprint and operational costs. These data centres are designed to be highly energy-efficient.

Open Compute Project: Facebook initiated the Open Compute Project, sharing innovative data centre technologies with the industry. The project’s aim is to create more efficient and sustainable data centres, a goal adopted by many other technology companies.

Networking Technologies: Facebook’s tech stack includes custom-built networking equipment and software, ensuring efficient and scalable networks. These technologies enable Facebook to manage the massive traffic generated by its users, emphasising data security and privacy.

Other Vital Tools That Facebook Utilises

Thrift: Facebook employs Apache Thrift, a cross-language framework developed in-house. It enables efficient communication between different languages and services, streamlining cross-language development.

Varnish: Varnish serves as an HTTP accelerator, acting as a load balancer and caching content for lightning-fast delivery. Facebook uses it extensively to serve photos and profile pictures, handling billions of requests daily.

Python: Python is a super versatile and widely used programming language at Facebook. It’s employed in various areas, from web development to data analysis and machine learning. Facebook utilises Python to power internal tools, automate tasks, and support a range of applications. Its simplicity and readability make it a favourite among developers at the company.

Scaling Challenges and Facebook’s Open Source Commitment

Facebook’s scaling challenges are formidable, with a user base surpassing two billion. The platform’s rapid growth constantly tests its performance limits, necessitating creative solutions. Facebook engineers employ iterative approaches and continually innovate to meet these challenges. For instance, Facebook’s photo storage system has been entirely rewritten multiple times as the platform has expanded.

A notable aspect of Facebook’s tech stack is its strong commitment to open-source initiatives. Facebook actively contributes to various open-source projects, including Linux, Memcached, MySQL, Hadoop, and many others. Moreover, Facebook has open-sourced much of its internally developed software, such as HipHop, Cassandra, Thrift, Scribe, React, GraphQL, PyTorch, Jest, and Docusaurus. This commitment to open source fosters innovation and benefits the broader tech community.

How Facebook Works – The LAMP Stack

The LAMP Stack, an acronym for Linux, Apache, MySQL, and PHP, represents a powerful software combination consisting of four distinct components. It enjoys widespread use for the creation of dynamic websites and web applications. While the specific elements within this stack can vary, the entire LAMP ensemble played a vital role in shaping the architecture of Facebook.

Let’s delve into each of these integral components:

Linux

Linux, a favoured operating system among software engineers and developers, commands a significant market share. Major organisations like IBM, Dell, Sun Microsystems (remember Java?), and Nokia entrust it with their operations. Linux supports a wide array of programming languages, including Java, PHP, Python, Go, Haskell, and more.

Apache

The Apache HTTP server stands as one of the most globally popular web servers, powering a staggering 29% of websites. It operates using multiprocessing modules such as process-based, process-thread-based, or event hybrid-based. Notably, Apache boasts features like fault tolerance (it continues working even if one server fails), load balancing (efficiently distributing equivalent workloads across multiple servers), web sockets (a key component in social media messaging), IPv6 compatibility, and high scalability. While NGINX has gained traction as an alternative, Apache remains a strong contender.

MySQL

MySQL, a widely recognised relational database, caters to businesses ranging from small to medium-sized audiences. Think back to the days of uploading timelines, posting statuses, sending birthday wishes, and sharing memes. Behind the scenes, it was MySQL’s backend queries that powered these experiences, keeping users captivated.

PHP

PHP, short for Hypertext Preprocessor, is a server-side scripting language that adds dynamic functionality to otherwise static web pages. It handles tasks like fetching and inserting user data into databases, such as MySQL. It’s celebrated for its user-friendly syntax and boasts a vast community of developers contributing to its evolution.

An intriguing feature of PHP is its ability to seamlessly integrate with HTML code. While PHP’s runtime speed was a point of contention prior to PHP 7.0, the ingenious engineers at Facebook devised a solution to enhance its performance. We’ll explore this alternative shortly. 

Profiling of the Live System

Facebook meticulously monitors its systems, including the performance of every single PHP function in the live production environment. They achieve this by using the open-source tool XHProf. Profiling the live PHP environment helps Facebook identify performance issues, allowing for swift optimisation and enhancement.

HipHop Virtual Machine (HHVM)

The HipHop Virtual Machine, or HHVM for short, may sound a bit funny at first, but it’s a remarkable accelerator. It converts native PHP code into bytecode, which is then further transformed into readable machine code. This transformation results in a significant speed boost, over nine times faster than conventional PHP code. It also reduces latency when loading data on Facebook’s website.

Now, let’s transition to the captivating world of strategies Facebook uses to store your cherished timeline photos 

Haystack – How Facebook Stores & Manages Your Images

Haystack is a specialised object storage system tailored for Facebook’s Photos application. Facebook currently manages a colossal repository of more than 260 billion images, totalling an impressive 20 petabytes of data. On a weekly basis, users contribute an additional billion photos, approximately 60 terabytes, and Facebook handles over one million image requests per second during peak usage.

You might recall the “Haystack” from the Windows 7 wallpapers. Indeed, it’s the same concept. At Facebook, it serves as an object store for all the photos users upload. The name draws an analogy from “finding a needle in a haystack,” where the needle represents an image and the haystack signifies a vast cluster of hardware storing data. But how does one locate a specific needle in such an immense haystack, surrounded by billions of needless elements? The answer is surprisingly simple.

Each needle is equipped with a header containing a tuple along with a footer. Records of these needles are stored in an object store file, much like a diary, which maps images across a cluster of nodes, even if they are geographically dispersed. Additionally, photos are stored in four different sizes: small, large, thumbnail, and the platform you’re currently using.

In comparison to Facebook’s previous approach to image storage, which relied on network-attached storage appliances using NFS, Haystack offers a more cost-effective and high-performance solution. Facebook identified a critical insight: the traditional design led to an excessive number of disk operations due to metadata lookups.

To address this challenge, Facebook took a meticulous approach to reduce the metadata associated with each photo. By shifting metadata lookups to the main memory of Haystack storage machines, they were able to significantly cut down on disk operations. This strategic choice not only optimises the retrieval of actual data but also substantially increases the overall system throughput.

Discovering Memcached – Facebook’s Data Scaling Technology

Memcached is a widely recognised and straightforward in-memory caching solution. Facebook utilises memcached as a foundational element to create and expand a distributed key-value store, enabling the support of the world’s largest social network.

Facebook’s system manages billions of requests every second and stores trillion of items, ensuring a dynamic user experience for more than a billion users worldwide. And, Memcached is an integral part of making this possible.

Memcached finds its place in the arsenal of not just Facebook but also other social media giants like Twitter, YouTube, Reddit, and Pinterest. But what exactly does it do?

What Is Caching?

Caching, in the realm of computer architecture, involves storing frequently accessed data in local memory (RAM) to expedite retrieval. Picture this: during the early days of the Covid-19 lockdown, we were all glued to our screens, binge-watching Netflix. When the latest season of “Money Heist” dropped, it was in high demand, with everyone eager to watch it. However, handling a sudden surge of users can burden a server. This is where caching comes into play.

Caching reduces the server’s load, speeds up loading times, and enhances the user experience. Over time, cached data gradually moves down the priority list as interest wanes, ultimately being removed by an eviction policy (a mechanism for discarding old data from the cache). The most common algorithm employed is “Least Recently Used” (LRU). This strategy significantly lightens the database’s load, which leads us to the next topic.

Facebook’s Database Management System

Apache Cassandra

Facebook uses Cassandra, an open-source wide-column database Facebook to house user data in a NoSQL database. Its design combines elements from Amazon’s DynamoDB and Google’s Big Table, making it a powerful player in managing data across distributed systems.

Facebook initially employed Cassandra for its item search feature and user interactions. Cassandra offers several attractive features, including fault tolerance (the ability to recover even when a server crashes), horizontal scalability (expanding to accommodate increasing user counts), and support for a wide range of columns and rows. To ensure smooth operations, Facebook uses Ganglia, a distributed performance monitoring tool, to keep track of nodes for any potential failures, allowing tasks to be evenly distributed among various nodes, with a master coordinating their activities.

GraphQL

GraphQL is an open-source query language that Facebook incorporates as a part of its tech stack for its advantages over REST APIs. GraphQL works by returning data from the database that is specifically relevant to the user, eliminating the need to sift through unnecessary information. 

Consider a scenario where a mom assigns her children specific grocery shopping tasks on the weekend. The mom’s role in assigning tasks mirrors the user defining a schema, while the children embody the role of GraphQL APIs.

In contrast, in this same example, REST APIs require the mom to do all the grocery shopping herself, fetching all available data. 

HIVE: Data Warehousing at Scale

HIVE is Facebook’s solution for handling big data. It acts as a data warehousing system, enabling the storage, retrieval, and analysis of vast amounts of data efficiently. This technology is pivotal in managing the immense volumes of user-generated content, insights, and interactions on the platform. By using HIVE, Facebook can derive valuable insights from data, improving its services and user experience.

The Unmentioned Hardware Side Of Facebook

While this discussion primarily focuses on software, Facebook’s hardware infrastructure is also a critical component of its success. For example, Facebook employs a content delivery network (CDN) to efficiently serve static content, enhancing performance. Additionally, Facebook operates numerous data centres worldwide, including substantial facilities in Sweden, Ireland, and Singapore. These data centres support the millions of servers required to maintain Facebook’s operations and provide redundancy and reliability.

facebook's tech stack managed with the aid of their data hardware

How Facebook Implements QA – Gradual Releases and Dark Launches

Facebook employs a system called “Gatekeeper” that allows them to run different code for various sets of users, introducing different conditions into the codebase. This approach enables gradual releases of new features, A/B testing, and activation of specific features exclusively for Facebook employees. It also supports “dark launches,” where elements of a new feature are quietly activated behind the scenes before they go live. This serves as a real-world stress test, revealing bottlenecks and other issues before the feature’s official launch.

Gradual Feature Disabling for Added Performance

When Facebook encounters performance challenges, they have a range of levers to gradually disable less critical features to improve the performance of core Facebook functions. This adaptive approach ensures the continuous delivery of essential services.

Conclusion – Tech Stack Of Facebook Explained

In summary, the technology stack at Facebook is a dynamic ecosystem of tools and solutions that have shaped the platform into what it is today. From the foundational LAMP Stack to the high-performance HHVM, and the advanced data management capabilities of Apache Cassandra and GraphQL, Facebook’s technological infrastructure is a testament to the company’s commitment to providing a seamless and engaging experience for its users.

Facebook’s impressive technology stack and innovative practices have enabled it to maintain its status as one of the world’s leading tech companies. As they continue to navigate the challenges of scaling a platform with more users than most countries, Facebook’s engineers remain at the forefront of technological advancement. They consistently push the boundaries of what’s possible in the dynamic realm of social media and technology.

The journey of Facebook’s tech evolution is far from over. With an unwavering commitment to open source, a dedication to innovation, and a user base that continues to expand, the future promises even more remarkable developments. Facebook will continue to shape the way we connect, communicate, and interact in the digital age.

If you’re keen on discovering how technology can turbocharge your CX to drive growth and scalability, we’re here to support your innovation journey. Let’s work together to craft influential customer experiences for your brand, enabling you to expand, and boost profitability. To get started, share some project details with us, and we’ll ensure we’re on the same page right from the beginning.

Topics
Published On

October 26, 2023