How Paytm built a petabyte-scale analytics application using Apache Druid with Imply

How Paytm built a petabyte-scale analytics application using Apache Druid with Imply

Paytm, a leading financial services company in India, switched to Imply to support a powerful, cost-efficient application that enables hundreds of internal users to analyze customer behavioral data in real-time.

Ravi Maurya, Technical Lead at Paytm

Paytm is one of India’s largest mobile payments, commerce and financial services platform. With a user base of more than 300 million, Paytm is on a mission to bring half a billion underbanked individuals and businesses into the mainstream economy using its digital banking and payments products.

Paytm’s platform powers transactions for more than 20 million merchants, which translates to a large volume of data – roughly five billion events per day.

The company’s Growth Team uses this data to create new revenue opportunities through targeted customer acquisition, building credit score models and cross-selling services.

To facilitate this analysis, the Data Team set out to build a more robust analytics application. And to support a modern, highly interactive analytics application, Paytm needed a high-powered database built to deliver real-time insights at scale.

Challenge: Complex and limited legacy application 

Initially, Paytm’s Data Team focused on replacing their off-the-shelf analytics service by building a custom analytics application. The legacy solution created unnecessary complexity for engineers trying to build out new verticals for data ingestion.

On top of that, query results would sometimes take minutes – or even hours-to load, hindering the Growth Team’s ability to act on insights. 

But once the team decided to build its own application, they faced another decision: which database would be cost-effective and easy to scale, while also providing the sub-second query speed users expected.

Ultimately, Paytm chose Apache Druid for its flexible, distributed architecture and always-on operability.

“We realized we needed to build a solution that would meet our current requirements and allow us to scale,” explained Ravi Maurya, Technical Lead at Paytm. “We chose Druid because it’s open source, which enabled us to start building quickly, but also provided the performance, flexibility and ease of development we were looking for.” 

After deploying Druid in a self-hosted AWS environment, the team saw an immediate improvement in query performance. At first, this was enough to power the interactive experience Paytm’s Growth team needed to better understand customer trends. With sub-second, ad hoc queries, members from the Growth Team were able to have live conversations with their data. 

However, as Paytm’s customer base expanded, so did user interactions across its various apps -which went from generating 3.5 billion to more than 5 billion events daily.

“As Paytm adoption grew with more customers across new verticals, our Druid deployment was required to handle billions of events every day,” Maurya said. “But in order to keep up with our Growth Team’s more sophisticated analytic queries, DevOps continuously optimized and tuned the Druid cluster, limiting their ability to build new code.”

Solution: Reliable and worry-free operations with Imply

A key aspect of any open-source software’s performance is maintenance. Paytm quickly found the process of operating and tuning its Druid cluster consumed a significant number of engineering hours each week.

In addition to reducing infrastructure cost, the Paytm team needed to minimize the maintenance and work required to manage a distributed system. Maurya said making the move from an open-source Druid deployment to Imply helped optimize Paytm’s Druid system, enabling the team to redirect valuable engineering resources toward other business initiatives. 

“[Our engineers] can concentrate more on the other pipelines and spend more time thinking about where data will be flowing, rather than how data will be stored and how to optimize performance on the data store end,” Maurya said.

Paytm now runs the enterprise distribution of Apache Druid-delivered by Imply – which is consistently fast, reliable and requires zero DevOps resources to manage. Backed by the deep Druid expertise and world-class support of Imply, the Paytm team is able to focus on scaling other aspects of its data pipeline.

“We don’t spend a single minute fixing anything related to Druid,” Maurya added. “No DevOps is involved in making any changes to the cluster, upgrading a node, or adding a property to a specific node. That has reduced a lot of effort from our side.” 

Results: Increased performance, effortless scalability

With Druid and Imply, Paytm has been able to drastically reduce its operational costs while enabling a dynamic, interactive data experience for the Growth Team across petabytes of raw data.

Along with implementing Imply, Paytm engaged Imply Professional Services to recommend the best ways to maintain consistent performance on Druid while controlling costs. The Professional Services team quickly determined that Paytm’s AWS Virtual Private Cloud (VPC) resources were overprovisioned. 

“With Imply’s expertise, we quickly resized our AWS VPC and reduced our infrastructure costs by half, more than covering the cost of Imply Enterprise,” Maurya said. “Even better, with Imply, cluster operations became point and click, requiring zero DevOps involvement, freeing up 12 hours of engineering hours a week.”

Paytm also gained critical performance improvements, able to query data sets 10 times faster (and sometimes 35 times faster) than before, with latency now at 100 milliseconds or less. This allows the Growth Team to explore data in real-time, and help propel Paytm’s presence in new verticals and use cases.

Click below to share this article

Browse our latest issue

Intelligent CIO APAC

View Magazine Archive