Expedia Increases Agility and Resiliency by Going All In on AWS
Expedia is all in on AWS, with plans to migrate 80 percent of its mission-critical apps from its on-premises data centers to the cloud in the next two to three years. By using AWS, Expedia has become more resilient. Expedia’s developers have been able to innovate faster while saving the company millions of dollars. Expedia provides travel-booking services across its flagship site Expedia.com and about 200 other travel-booking sites around the world.
Expedia, Inc. is a leading online travel company, providing leisure and business travel to customers worldwide. Expedia’s extensive brand portfolio includes Expedia.com, one of the world’s largest full service online travel agency, with sites localized for more than 20 countries; Hotels.com, the hotel specialist with sites in more than 60 countries; Hotwire.com, the hotel specialist with sites in more than 60 countries, and other travel brands.
The company delivers consumer value in leisure and business travel, drives incremental demand and direct bookings to travel suppliers, and provides advertisers the opportunity to reach a highly valuable audience of in-market travel consumers through Expedia Media Solutions. Expedia also powers bookings for some of the world’s leading airlines and hotels, top consumer brands, high traffic websites, and thousands of active affiliates through Expedia Affiliate Network.
Expedia is committed to continuous innovation, technology, and platform improvements to create a great experience for its customers. The Expedia Worldwide Engineering (EWE) organization supports all websites under the Expedia brand. Expedia began using Amazon Web Services (AWS) in 2010 to launch Expedia Suggest Service (ESS), a typeahead suggestion service that helps customers enter travel, search, and location information correctly. According to the company’s metrics, an error page is the main reason for site abandonment. Expedia wanted global users to find what they were looking for quickly and without errors. At the time, Expedia operated all its services from data centers in Chandler, AZ. The engineering team realized that they had to run ESS in locations physically close to customers to enable a quick and responsive service with minimal network latency.
Why Amazon Web Services
Expedia considered on-premises virtualization solutions as well as other cloud providers, but ultimately chose Amazon Web Services (AWS) because it was the only solution with the global infrastructure in place to support Asia Pacific customers.
“From an architectural perspective, infrastructure, automation, and proximity to the customer were key factors,” explains Murari Gopalan, Technology Director. “There was no way for us to solve the problem without AWS.”
Launching ESS on AWS
“Using AWS, we were able to build and deliver the ESS service within three months,” says Magesh Chandramouli, Principal Architect.
ESS uses algorithms based on customer location and aggregated shopping and booking data from past customers to display suggestions when a customer starts typing. For example, if a customer in Seattle entered sea when booking a flight, the service would display Seattle, SeaTac, and other relevant destinations.
Expedia launched ESS instances initially in the Asia Pacific (Singapore) Region and then quickly replicated the service in the US West (Northern California) and EU (Ireland) Regions. Expedia engineers initially used Apache Lucene and other open source tools to build the service, but eventually developed powerful tools in-house to store indexes and queries.
By deploying ESS on AWS, Expedia was able to improve service to customers in the Asia Pacific region as well as Europe.
“Latency was our biggest issue,” says Chandramouli. “Using AWS, we decreased average network latency from 700 milliseconds to less than 50 milliseconds.”
Running Critical Applications on AWS
By 2011, Expedia was running several critical, high-volumes applications on AWS, such as the Global Deals Engine (GDE). GDE delivers deals to its online partners and allows them to create custom websites and applications using Expedia APIs and product inventory tools.
Expedia provisions Hadoop clusters using Amazon Elastic Map Reduce (Amazon EMR) to analyze and process streams of data coming from Expedia’s global network of websites, primarily clickstream, user interaction, and supply data, which is stored on Amazon Simple Storage Service (Amazon S3). Expedia processes approximately 240 requests per second. “The advantage of AWS is that we can use Auto Scaling to match load demand instead of having to maintain capacity for peak load in traditional datacenters,” comments Gopalan. Expedia uses AWS CloudFormation with Chef to deploy its entire front and backend stack into its Amazon Virtual Private Cloud (Amazon VPC) environment. Expedia uses a multi-region, multi-availability zone architecture with a proprietary DNS service to add resiliency to the applications. Figure 2 demonstrates the architecture of the GDE service on AWS.
Expedia can add a new cluster to manage GDE and other high volume applications without worrying about the infrastructure.
“If we had to host the same applications on our on-premises data center, we wouldn’t have the same level of CPU efficiency,” says Chandramouli. “If an application processes 3,000 requests per second, we would have to configure our physical servers to run at about 30 percent capacity to avoid boxes running hot. On AWS, we can push CPU consumption close to 70 percent because we can always scale out. Fundamentally, running in AWS enables a 230 percent CPU consumption efficiency in data processing. We run our critical applications on AWS because we can scale and use the infrastructure efficiently.”
Using IAM to Manage Security
To simplify the management of GDE, Expedia developed an identity federation broker that uses AWS Identity and Access Management (AWS IAM) and the AWS Security Token Service (AWS STS). The federation broker allows systems administrators and developers to use their existing Windows Active Directory (AD) accounts to single sign-on (SSO) to the AWS Management Console. In doing so, Expedia eliminates the need to create IAM users and maintain multiple environments where user identities are stored. Federation broker users sign into their Windows machines with their existing Active Directory credentials, browse to the federation broker, and transparently log into the AWS Management Console. This allows Expedia to enforce password and permissions management within their existing directory and to enforce group policies and other governance rules. Additionally, if an employee ever leaves the company or takes a different role, Expedia simply make changes to Active Directory to revoke or changes AWS permissions for the user instead of inside of AWS.
Standardizing Application Deployment
The success of the ESS and GDE services sparked interest from other Expedia development teams, who began to use AWS for regional initiatives. By 2012, Expedia was hosting applications in the US East (Northern Virginia), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), and US West (Northern California) Regions. Expedia Worldwide Engineering culled best practices from these initiatives to create a standardized deployment setup across all Regions. As Jun-Dai Bates-Kobashigawa, Principal Software Engineer explains,
“We’re using Chef to automate the configuration of the Amazon Elastic Compute Cloud (Amazon EC2) servers. We can take any AWS image and use scripts stored in Chef to build a machine and spin up an instance customized for a team in just in a few minutes.”
The team consolidated all AWS accounts under one AWS account and provisioned one Amazon VPC network in each Region. This allows each Region to have an isolated infrastructure with a separate firewall, application layer, and database layer. Expedia applies Amazon EC2 Security Group firewall settings to safeguard applications and services. Amazon VPC is completely integrated into Expedia’s lab and production environments.
“The Amazon VPC experience for the developer is totally seamless,” says Bates-Kobashigawa. “Developers use the same Active Directory service for authentication and may not even know that some of the servers that they log onto are running on AWS. It feels like a physical infrastructure with its own subnets and multiple layers, and it’s also easy to connect to our on-premises infrastructure using VPN.”
Expedia uses a blue-green deployment approach to create parallel production environments on AWS, enabling continuous deployment and faster time-to-market.
“One of our metrics for success is the reduction of time to deploy within our teams,” says Gopalan. “We use this method to launch applications pretty quickly compared to a traditional deployment. Moreover, reducing the cost of a rollback to zero means we can be fearless with deployments.”
Expedia uses AWS to develop applications faster, scale to process large volumes of data, and troubleshoot issues quickly. By using AWS to build a standard deployment model, development teams can quickly create the infrastructure for new initiatives. Critical applications run in multiple Availability Zones in different Regions to ensure data is always available and to enable disaster recovery. Expedia Worldwide Engineering is working on building a monitoring infrastructure in all Regions and moving to a single infrastructure.
Generally, teams have more control over development and operations on AWS. When Expedia experienced conversion issues for its Client Logging service, engineers were able to track and identify critical issues within two days. Expedia estimates that it would have taken six weeks to find the script errors if the service ran in a physical environment.
Previously, Expedia had to provision servers for a full-load scenario in its data centers.
“To deploy an application using our on-site facility, you have to think about the physical infrastructure,” Bates-Kobashigawa explains. “If there are 100 boxes running, you might have to take 20 boxes out to apply new code. Using AWS, we don’t have to take capacity out; we just add new capacity and send traffic to it.”
Chandramouli comments, “When I was developer, you didn’t want to invest in architecture if you didn’t know how the application would turn out. I had to plan upfront and build a proof of concept to present to stakeholders. By using AWS, I’m not bound by throughput limitations or CPU capacity. When I think of AWS, freedom is the first word that comes to mind.”