Walmart’s Element: A machine learning platform like no other
Staff writer
March. 14, 2024 | 5 min read
The following is based on insights from Walmart Global Tech associates, Thomas Vengal, Pamidi Pradeep, Bagavath Subramaniam, Hema Rajesh, Girish Ramachandran Pillai, Ravishankar K S, Anirban Chatterjee, Kunal Banerjee, Rahul Rawat and Anil Madan—the team behind Element.
Each week, approximately 240 million customers and members shop at our 10,500 Walmart stores and e-commerce websites in 19 countries. Ensuring they get what they want, where they want and when they want, requires precise coordination across a complex supply chain comprising over 100,000 global suppliers, 150 distribution centers, ocean freighters, air cargo and one of the largest private fleet of trucks in the world. As technology has evolved, we’ve evolved how we manage our supply chain and the solutions used to do so.
The growing role of AI in inventory management
Artificial Intelligence (AI) is a key in determining product demand, how inventory moves through our supply chain and how we personalize and improve experiences for our customers, members and associates. Its ability to analyze thousands of data points enables greater efficiency and accuracy across the business. In turn, we can make more precise decisions and go to market faster, while maintaining every day low costs.
However, there are still a number of hurdles to consider before deploying AI models. While there are cloud providers in the market who offer ‘Infrastructure as a Service (IaaS)’ and ‘Platform as a Service (PaaS)’ along with the tools and services needed for developing AI solutions, in scaling them, businesses often grapple with vendor lock-ins, exorbitant license costs and fees, limited availability and reliability, customization issues—the list goes on. What’s more, no single platform has all the answers!
To circumnavigate these challenges, Walmart introduced machine learning (ML) platform, Element, to revolutionize platform capabilities and simplify the adoption of AI/ML at scale for data scientists, data engineers, ML engineers and application developers engaged throughout the AI solution lifecycle.
Element solves for a massive, fundamental need
Teams built Element’s tech stack from the ground up, incorporating Walmart’s guiding principles of leveraging best-of-breed technologies, prioritizing speed and scale, while also considering cost and governance. Its foundation includes Kubernetes for container orchestration and a Machine Learning Operations (MLOps) deployment framework, which supports deploying adaptable, auto-scalable models, all while effectively monitoring them across multiple clouds and regions.
But that’s not all; its unique structure also enables integration with existing enterprise services to improve productivity, speed and innovation—users can switch clouds without having to spend hours reconfiguring and tuning for the new cloud environment.
Powering use cases around Walmart and retail innovation
Element has quickly evolved into a platform equipped with a rich set of capabilities and features. Teams have successfully deployed Element across multiple clouds and regions with around two dozen services, spawning workloads distributed among thousands of CPU cores and hundreds of GPUs.
- Channel performance: Walmart provides its channel partners with an AI-powered tool that evaluates sales data, promotions and shelf assortments, and shares actionable insights and recommendations. The sheer volume of items sold across our network of stores can quickly become an analysis and feature engineering headache.
Element helps data scientists easily perform feature engineering and ML modelling on subsets of data for individual items. By enabling seamless connectivity to various data systems, distributed runtimes, efficient data analysis and modeling for large-scale retail operations, Element reduces both training time and costs per supplier
- Search: When customers search for ‘hats’ on Walmart.com in winter and summer, they expect to see different results. Search is both contextual and temporal. To remove the cognitive dissonance between the results we provide and what customers are actually searching for, data scientists must keep training models on complicated hypotheses and work on large non-linear models and ever-growing feature sets. This herculean task goes beyond understanding data science algorithms and regularly causes a slowdown of overall experimentation velocity and increased iteration costs.
Element improves hyper parameter tuning to enable expedited experiments by running multiple parallel iterations, allowing Walmart.com Search Team data scientists to rapidly experiment on new hypotheses, visually compare various values and identify the best parameters
- Market Intelligence: Market Intelligence is a business intelligence solution that helps Walmart make better decisions by providing insights into competitor pricing, assortment and marketing strategies. At its core, it is a product matcher that uses a variety of methods, algorithms and machine learning techniques for competitive price determination.
GPU-enabled notebooks and an inference service from Element enable teams to quickly build and deploy ML models required for Market Intelligence
- Last mile delivery: The ‘intelligent driver dispatch system’ developed by our last mile delivery team helps reduce cost per customer orders and the delivery lead times, while maintaining high on-time delivery rates.
The system, built on Element, uses a combination of ML, optimization and heuristic models to time driver searches and match the best drivers to trips to ensure customer orders are delivered on time
Accelerating innovation one team at a time
As highlighted in the above use cases, Element has been a game changer for data scientists and ML engineers at Walmart. Teams using it observe:
- Less time spent in evaluation: By adopting the best-of-breed tools and technologies, Element has significantly helped individuals and teams reduce time spent with external vendors or in evaluating multiple commercial and open-source tools
- Lower overall start-up time: Teams can access any of the multi-cloud cloud resources they need, including ready-to-use development and deployment tools, immediately
- Speedier development, deployment and operationalization: With inner-sourcing methods, teams can access the latest offerings to accelerate the development of high-quality AI solutions. With standardized MLOps processes and integration for deployments in a multi-cloud environment, teams can now deploy solutions faster. Time spent to operationalize models has reduced from a couple of weeks in the past to under an hour
By utilizing a scalable and flexible ML platform like Element, our teams can continue to experiment with capabilities that enhance cost savings and efficiencies, promote the development of best-in-class tools and reduce time to market for AI solutions... A win-win for everyone!