The latest issue of "Communications of the ACM" by the Association for Computing Machinery (ACM) published an article on the open-source machine learning database project, OpenMLDB, which received unanimous recognition from the magazine's editorial board.
Article Link: https://cacm.acm.org/magazines/2023/7/274061-principles-and-practices-of-real-time-feature-computing-platforms-for-ml/fulltext
ACM's flagship magazine, "Communications of the ACM", is the premier chronicler of computing technologies, covering the latest discoveries, innovations, and research that inspire and influence the field. It brings readers in-depth stories of emerging areas of computer science, new trends in IT, practical research applications, debates in technology implications, public policies, engineering challenges, and market trends. Read by over 85,000 computing researchers and practitioners worldwide, It is recognized as the most trusted and knowledgeable source of industry information for today's computing professional.
Several editorial board members, including Professor Flora Salim from the University of New South Wales, Professor Ken-ichi Kawarabayashi from the National Institute of Informatics in Japan, and Dr. Bingsheng He, ACM Distinguished Member in 2020, Vice Dean and Professor at the School of Computing, National University of Singapore, all recognize the contributions of OpenMLDB in promoting the application of artificial intelligence in enterprise-level settings. Dr. Bingsheng, in particular, refers to it as "a very efficient feature engineering tool to help AI tasks."
"Communications of the ACM" July Interview."
The featured article titled "Principles and Practices of Real-Time Feature Computing Platforms for ML" introduces OpenMLDB from various aspects, including business challenges, design principles, core features, and best practices related to real-time feature computation for machine learning.
The article highlights the strong demand for real-time features to obtain highly effective models in machine learning applications, such as real-time personalized recommendation, risk control and anti-fraud. However, traditional feature scripts built by data scientists (typically developed using Python, SparkSQL) often fall short in meeting production-level performance requirements such as low latency, high throughput, and high availability. In order to meet real-time performance requirements, it often requires engineering teams to recosntruct and optimize the code (using high-performance databases, C++, etc.). As a result, the involvement of two teams and two systems in the entire process, from offline development to online deployment, necessitates a crucial step of online-offline consistency verification, which incurs significant communication, development, and testing costs.
OpenMLDB bridges the gap between offline development and online real-time performance through SQL development capabilities, providing an production-level feature computing platform with consistency between online and offline computations, high real-time performance and low barriers to entry.
Figure below (left) illustrates the traditional process for real-time feature computing deployment. Data scientists first develop offline feature scripts, which are then reconstructed by the engineering teams into real-time services to meet production requirements. Data scientists and engineers must invest significant effort in iterative development and cooperative consistency verification to align the results. OpenMLDB relieves the headache of verification by offering a unified execution engine and the same SQL APIs for both offline training and online serving. With OpenMLDB, data scientists define features using SQL language, ensure consistency between online and offline computations through the unified execution plan generator, and achieve low latency, high throughput and high availability with real-time SQL engine for online services. As a result, data scientists can perform feature script development using SQL, and after meeting the performance requirements, they can easily deploy it to online services with just one command, significantly saving time and resources.
Abstract functional blocks to achieve seamless development to deployment process in OpenMLDB.
OpenMLDB has gained widespread adoption among community enterprise users, with prominent examples being Akulaku and Vipshop. Here are the specific usecases:
-
Akulaku (fintech unicorn in Indonesia): By implementing OpenMLDB in its fintech scenarios, Akulaku has save more than $US500K per year in terms of server and personnel costs. It is worth mentioning that OpenMLDB is the only solution that possesses linear scale capability, amongst other solutions including Spark, Flink, and other MPP options.
-
Vipshop (China's top e-commerce platform): Vipshop has utilized OpenMLDB in its overseas business for personalized product and brand recommendation, achieving recommendation delays within 10 milliseconds and a 60% increase in feature development speed.
Both Akulaku and Vipshop's usecases demonstrate the effectiveness and versatility of OpenMLDB in diverse real-world applications, offering significant advantages in terms of efficiency, cost savings, and performance improvements.
OpenMLDB Official Website: https://openmldb.ai/
GitHub: https://github.com/4paradigm/OpenMLDB
Documentation: https://openmldb.ai/docs/zh/