ML for Systems Papers (Last updated: Fall 2021)

This list is incomplete. If we are missing a paper, please email [email protected] and we will include it. If you would like to be informed about new research papers, subscribe here.

Acknowledgement: Parts of this list were sourced from this repository.

Tutorials / Surveys
Learned Range Indexes
New Learned Index Applications
Learned Multi-Dimensional Indexing & Storage Layouts
Learned Bloom Filters
Hash Maps / Hashing
Partitioning
Data Compression
Systems and General Optimizations
Index Recommendation
Configuration Tuning
Cardinality / Selectivity Estimation
Data-based Cardinality Estimation
Query-based Cardinality Estimation
Cost Estimation
Query Optimization
Query Processing
Scheduling
Caching
Sorting
Garbage Collection
Sketches
Compilation / Compilers
SQL-Related
Workload Related
Data Cleaning and Exploration

Tutorials / Surveys

From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems. Stratos Idreos, Tim Kraska. SIGMOD 2019.
A Tutorial on Learned Multi-dimensional Indexes. Abdullah Al-Mamun, Hao Wu, Walid G. Aref. SIGSPATIAL 2020.
Database Meets Artificial Intelligence: A Survey. Xuanhe Zhou, Chengliang Chai, Guoliang L, Ji Sun. TKDE 2020.
Learned data structures. Paolo Ferragina and Giorgio Vinciguerra. Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896.
AI Meets Database: AI4DB and DB4AI. Guoliang Li, Xuanhe Zhou, Lei Cao. SIGMOD 2021.
Machine Learning for Cloud Data Systems: the Promise, the Progress, and the Path Forward. Alekh Jindal, Matteo Interlandi. VLDB 2021.

Learned Range Indexes

Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised?. Wenkun Xiang, Hao Zhang, Rui Cui, Xing Chu, Keqin Li, Wei Zhou. IEEE 2018.
The Potential of Learned Index Structures for Index Compression. Harrie Oosterhuis, J. Shane Culpepper, Maarten de Rijke. ADCS 2018.
A-Tree: A Bounded Approximate Index Structure. Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska. SIGMOD 2018.
FITing-Tree: A Data-aware Index Structure. Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska. SIGMOD 2019.
Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations. Yingjun Wu, Jia Yu, Yuanyuan Tian, Richard Sidle, Ronald Barber. SIGMOD 2019.
Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search?. Peter Van Sandt, Yannis Chronis, Jignesh Manubhai Patel. SIGMOD 2019.
Considerations for handling updates in learned index structures. Ali Hadian, Thomas Heinis. aiDM 2019.
Accelerating B+tree Search by Using Simple Machine Learning Techniques. Anisa Llaveshi, Utku Sirin, Anastasia Ailamaki, Robert West. AIDB 2019.
Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes. Ali Hadian, Thomas Heinis. EDBT 2019.
Learned Indexes for Dynamic Workloads. Chuzhe Tang, Zhiyuan Dong, Minjie Wang, Zhaoguo Wang, Haibo Chen. CoRR 2019.
ASLM: Adaptive single layer model for learned index. X Li, J Li, X Wang - International Conference on Database Systems for Advanced Applications. 2019.
Hybrid indexes by exploring traditional B-tree and linear regression. Wenwen Qu, Xiaoling Wang, Jingdong Li, and Xin Li. International Conference on Web Information Systems and Applications 2019.
A Scalable Learned Index Scheme in Storage Systems. Pengfei Li, Yu Hua, Pengfei Zuo, Jingnan Jia. arXiv.
SOSD: A Benchmark for Learned Indexes. Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, Thomas Neumann. NeurIPS 2019.
ALEX: An Updatable Adaptive Learned Index. Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska. SIGMOD 2020.
CDFShop: Exploring and Optimizing Learned Index Structures. Ryan Marcus, Emily Zhang, Tim Kraska. SIGMOD 2020.
LISA: A Learned Index Structure for Spatial Data. Pengfei Li, Hua Lu, Qian Zheng, Long Yang, Gang Pan. SIGMOD 2020.
RadixSpline: A Single-Pass Learned Index. Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, Thomas Neumann. aiDM 2020.
Why Are Learned Indexes So Effective?. Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra. ICML 2020.
The Case for Learned Spatial Indexes. Varun Pandey, Alexander van Renen, Andreas Kipf, Ibrahim Sabek, Jialin Ding, Alfons Kemper. AIDB 2020.
The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Paolo Ferragina, Giorgio Vinciguerra. VLDB 2020.
XIndex: A Scalable Learned Index for Multicore Data Storage. Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, Haibo Chen. PPoPP 2020.
SIndex: A Scalable Learned Index for String Keys. Youyun Wang, Chuzhe Tang, Zhaoguo Wang, Haibo Chen. APSys 2020.
Learned Indexes for a Google-scale Disk-based Database. Hussam Abu-Libdeh, Deniz Altınbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou (Steve) Li, Andy Ly, Christopher Olston. ML for Systems Workshop 2020.
Effectively Learning Spatial Indices. Jianzhong Qi, Guanli Liu, Christian S. Jensen, Lars Kulik. VLDB 2020.
START — Self-Tuning Adaptive Radix Tree. Philipp Fent, Michael Jungmair, Andreas Kipf, Thomas Neumann. ICDEW 2020.
MADEX: Learning-augmented Algorithmic Index Structures. Ali Hadian and Thomas Heinis. AIDB@VLDB 2020.
From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau. OSDI 2020.
Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache. Xingda Wei, Rong Chen, Haibo Chen. OSDI 2020.
Benchmarking Learned Indexes. Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, Tim Kraska. VLDB 2021.
RUSLI: Real-time Updatable Spline Learned Index. Mayank Mishra, Rekha Singhal. aiDM 2021.
Updatable Learned Index with Precise Positions. Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, Chunxiao Xing. VLDB 2021.
APEX: A High-Performance Learned Index on Persistent Memory. Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, Tianzheng Wang. arXiv 2021.
A Lazy Approach for Efficient Index Learning. Guanli Liu, Lars Kulik, Xingjun Ma, Jianzhong Qi. arXiv 2021.
The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data. Tu Gu, Kaiyu Feng, Gao Cong, Cheng Long, Zheng Wang, Sheng Wang. arXiv 2021.
Spatial Interpolation-based Learned Index for Range and kNN Queries. Songnian Zhang, Suprio Ray, Rongxing Lu, Yandong Zheng. arXiv 2021.
CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm. Jiaoyi Zhang, Yihan Gao. arXiv 2021.
The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. Evgenios M. Kornaropoulos, Silei Ren, Roberto Tamassia. arXiv 2021.
Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction. Ali Hadian, Thomas Heinis. arXiv, 2021.
A Pluggable Learned Index Method via Sampling and Gap Insertion. Yaliang Li, Daoyuan Chen, Bolin Ding, Kai Zeng, Jingren Zhou. arXiv, 2021.
PolyFit: Polynomial-based Indexing Approach for Fast Approximate Range Aggregate Queries. Zhe Li, Tsz Nam Chan, Man Lung Yiu, Christian S. Jensen EDBT, 2021.
Learned Metric Index. Matej Antol, Jaroslav Ol’ha, Terézia Slanináková, Vlastislav Dohnal. Information Systems. 2021.
On the performance of learned data structures. Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra. Theor. Comput. Sci. 2021.
A Tailored Regression for Learned Indexes: Logarithmic Error Regression. Martin Eppert, Philipp Fent, Thomas Neumann. aiDM 2021.
Towards Practical Learned Indexing. Mihail M Stoian, Andreas Kipf, Ryan C Marcus, Tim Kraska. AIDB 2021.
Bounding the Last Mile: Efficient Learned String Indexing. Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, Tim Kraska. AIDB 2021.
Micro-architectural Analysis of a Learned Index. Mikkel Møller Andersen, Pınar Tözün. arXiv 2021.
Standard versus uniform binary search and their variants in learned static indexing: The case of the searching on sorted data benchmarking software platform. Domenico Amato, Giosué Lo Bosco, Raffaele Giancarlo. Software: Practice and Experience 2022.
Learned Sorted Table Search and Static Indexes in Small Model Space. Domenico Amato, Giosué Lo Bosco, Raffaele Giancarlo. AIxIA 2021.
On the Suitability of Neural Networks as Building Blocks for the Design of Efficient Learned Indexes. Domenico Amato, Giosué Lo Bosco, Raffaele Giancarlo. EANN 2022.

New Learned Index Applications

A Computational Approach to Packet Classification. Alon Rashelbach, Ori Rottenstreich, Mark Silberstein. SIGCOMM 2020.
LISA: Towards Learned DNA Sequence Search. Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, Tim Kraska. Workshop on Systems for ML at NeurIPS 2019.
A "Learned" Approach to Quicken and Compress Rank/Select Dictionaries. Antonio Boffa, Paolo Ferragina, Giorgio Vinciguerra. ALENEX 2021.
Feasibility of Longest Prefix Matching using Learned Index Structures. Higuchi, Shunsuke and Takemasa, Junji and Koizumi, Yuki and Tagami, Atsushi and Hasegawa, Toru. SIGMETRICS Perform. Eval. Rev. 2021.
BWA-MEME: BWA-MEM emulated with a machine learning approach. Youngmok Jung, Dongsu Han. bioRxiv 2021.
Learned Sorted Table Search and Static Indexes in Small-Space Data Models. Domenico Amato, Giosuè Lo Bosco, Raffaele Giancarlo. Data. 2023, Vol. 8, no. 3, 56.
LIFOSS: a learned index scheme for streaming scenarios. Tong Yu, Guanfeng Liu, An Liu, Zhixu Li, Lei Zhao. World Wide Web 2022.

Learned Multi-Dimensional Indexing & Storage Layouts

Qd-tree: Learning Data Layouts for Big Data Analytics. Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, Rajeev Acharya. SIGMOD 2020.
Hands-off Model Integration in Spatial Index Structures. Ali Hadian, Ankit Kumar, and Thomas Heinis. Proceedings of the 2nd International Workshop on Applied AI for Database Systems and Applications 2020.
Leveraging Soft Functional Dependencies for Indexing Multi-dimensional Data. Behzad Ghaffari, Ali Hadian, and Thomas Heinis. arXiv 2020.
Effectively learning spatial indices. Jianzhong Qi, Guanli Liu, Christian S Jensen, and Lars Kulik. VLDB 2020.
A Study of Learned KD Tree Based on Learned Index. P. Yongxin, Z. Wei, Z. Lin and D. Hongle. 2020 International Conference on Networking and Network Applications (NaNA).
The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. Angjela Davitkova, Evica Milchevski, Sebastian Michel. EDBT 2020.
Learning Multi-dimensional Indexes. Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska. SIGMOD 2020.
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. Jialin Ding, Vikram Nathan, Mohammad Alizadeh, Tim Kraska. VLDB 2021.
Instance-Optimized Data Layouts for Cloud Analytics Workloads. Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke and Tim Kraska. SIGMOD 2021.
The Case for ML-Enhanced High-Dimensional Indexes. Rong Kang, Wentao Wu, Chen Wang, Ce Zhang, Jianmin Wang. AIDB 2021.
The "AI+R"-Tree: An Instance-Optimized R-tree. Abdullah-Al-Mamun, Ch. Md. Rakin Haider, Jianguo Wang, Walid G. Aref. MDM 2022.

Learned Bloom Filters

The Case for Learned Index Structures. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis. SIGMOD 2018.
A Model for Learned Bloom Filters and Optimizing by Sandwiching. Michael Mitzenmacher. NeurIPS 2018.
Lifting the Curse of Multidimensional Data with Learned Existence Indexes. Stephen Macke, Alex Beutel, Tim Kraska, Maheswaran Sathiamoorthy, Derek Zhiyuan Cheng, Ed H. Chi. ML for Systems Workshop 2018.
Meta-Learning Neural Bloom Filters. Jack W Rae, Sergey Bartunov, Timothy P Lillicrap. ICML 2019.
Stable Learned Bloom Filters for Data Streams. Qiyu Liu, Libin Zheng, Yanyan Shen, Lei Chen. VLDB 2020.
Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web. Zhenwei Dai, Anshumali Shrivastava. NeurIPS 2020.
Partitioned Learned Bloom Filters. Kapil Vaidya, Eric Knorr, Tim Kraska, Michael Mitzenmacher. ICLR 2021.
deepBF: Malicious URL detection using Learned Bloom Filter and Evolutionary Deep Learning. Ripon Patgiri, Anupam Biswas, Sabuzima Nayak. In submission.
Stacked Filters: Learning to Filter by Structure. Kyle Deeds, Brian Hentschel, and Stratos Idreos. VLDB 2021.
On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters. Giacomo Fumagalli, Davide Raimondi, Raffaele Giancarlo, Dario Malchiodi, Marco Frasca. ICPRAM 2022.
Optimizing Learned Bloom Filters: How Much Should Be Learned?. Zhenwei Dai, Anshumali Shrivastava, Pedro Reviriego, José Alberto Hernández. IEEE Embedded Systems Letters 2022.

Hash Maps / Hashing

The Case for Learned Index Structures. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis. SIGMOD 2018.
When Are Learned Models Better Than Hash Functions?. Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf, Tim Kraska. AIDB 2021.

Partitioning

Schism: a Workload-Driven Approach to Database Replication and Partitioning. Carlo Curino, Evan Philip Charles Jones, Yang Zhang, and Samuel R. Madden. VLDB 2010.
Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions. Alexandru Turcu, Roberto Palmieri, Binoy Ravindran, and Sachin Hirve. IEEE Transactions on Parallel and Distributed Systems Volume: 27, Issue: 1, Jan. 1 2016.
GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning. Gabriel Campero Durand, Marcus Pinnecke, Rufat Piriyev, Mahmoud Mohsen, David Broneske, Gunter Saake, Maya S. Sekeran, Fabián Rodriguez, and Laxmi Balami. aiDM 2018.
Learning a Partitioning Advisor with Deep Reinforcement Learning. Benjamin Hilprecht, Carsten Binnig, and Uwe Roehm. arXiv 2019.
A Genetic Optimization Physical Planner for Big Data Warehouses. Benkrid, Soumia, Yacine Mestoui, Ladjel Bellatreche, and Carlos Ordonez. 2020 IEEE International Conference on Big Data (Big Data).
Lachesis: Automated Partitioning for UDF-Centric Analytics. Jia Zou, Amitabh Das, Pratik Barhate, Arun Iyengar, Binhang Yuan, Dimitrije Jankov, and Chris Jermaine. VLDB 2021.
Learning a Partitioning Advisor for Cloud Databases. Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. SIGMOD 2020.
Automated vertical partitioning with deep reinforcement learning. Gabriel Campero Durand, Rufat Piriyev, Marcus Pinnecke, David Broneske, Balasubramanian Gurumurthy, Gunter Saakee. ADBIS 2019.

Data Compression

LEA: A Learned Encoding Advisor for Column Stores. Lujing Cen, Andreas Kipf, Ryan Marcus, Tim Kraska. aiDM 2021.
DeepSqueeze: Deep Semantic Compression for Tabular Data. Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, Ugur Cetintemel. SIGMOD 2020.
Repetition- and Linearity-Aware Rank/Select Dictionaries. Paolo Ferragina, Giovanni Manzini, Giorgio Vinciguerra. ISAAC 2021.

Systems and General Optimizations

Self-Driving Database Management Systems. Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, Tieying Zhang. CIDR 2017.
SageDB: A Learned Database System. Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, Vikram Nathan. CIDR 2019.
Active Learning for ML Enhanced Database Systems. Lin Ma, Bailu Ding, Sudipto Das, Adith Swaminathan. SIGMOD 2020.
Self-driving database systems: a conceptual approach. Jan Kossmann, Rainer Schlosser. Distributed and Parallel Databases 2020.
One Model to Rule them All: Towards Zero-Shot Learning for Databases. Benjamin Hilprecht, Carsten Binnig. arXiv 2021.
UDO: Universal Database Optimization using Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Debabrota Basu. arXiv 2021.
Towards a Benchmark for Learned Systems. Laurent Bindschaedler, Andreas Kipf, Tim Kraska, Ryan Marcus, Umar Farooq Minhas. SMDB workshop 2021.
A Unified Transferable Model for ML-Enhanced DBMS. Ziniu Wu, Peilun Yang, Pei Yu, Rong Zhu, Yuxing Han, Yaliang Li, Defu Lian, Kai Zeng, Jingren Zhou. arXiv 2021.
Expand your Training Limits! Generating Training Data for ML-based Data Management. Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Volker Markl. SIGMOD 2021.
MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems. Lin Ma, William Zhang, Jie Jiao, Wuwen Wang, Matthew Butrovich, Wan Shen Lim, Prashanth Menon, Andrew Pavlo. SIGMOD 2021.
Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm. YoungJoon Park, Dongkyu Lee, Tien-Cuong Bui. AIDB 2021.
openGauss: An Autonomous Database System. Guoliang Li, Xuanhe Zhou, Ji Sun, Xiang Yu, Yue Han, Lianyuan Jin, Wenbo Li, Tianqing Wang, Shifu Li. VLDB 2021.
DBMind: A Self-Driving Platform in openGauss. Xuanhe Zhou, Lianyuan Jin, Ji Sun, Xinyang Zhao, Xiang Yu, Jianhua Feng, Shifu Li, Tianqing Wang, Kun Li, Luyang Liu. VLDB 2021.
Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms. Liqun Shao, Yiwen Zhu, Siqi Liu, Abhiram Eswaran, Kristin Lieber, Janhavi Suresh Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos. SoCC 2019.
Moneyball: Proactive Auto-Scaling in Microsoft Azure SQL Database Serverless. Olga Poppe, Qun Guo, Willis Lang, Pankaj Arora, Morgan Oslake, Shize Xu, Ajay Kalhan. VLDB 2022.
PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!. H. M. Sajjad Hossain, Marc T. Friedman, Hiren Patel, Shi Qiao, Soundar Srinivasan, Markus Weimer, Remmelt Ammerlaan, Lucas Rosenblatt, Gilbert Antonius, Peter Orenberg, Vijay Ramani, Abhishek Roy, Irene Shaffer, Alekh Jindal. VLDB 2021.

Index Recommendation

Index Selection in a Self- Adaptive Data Base Management System. Michael Hammer, Arvola Chan. SIGMOD 1976.
AutoAdmin “What-if” Index Analysis Utility. Surajit Chaudhuri, Vivek Narasayya. SIGMOD 1998.
Self-Tuning Database Systems: A Decade of Progress. Surajit Chaudhuri, Vivek Narasayya. VLDB 2007.
AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, Vivek R. Narasayya. SIGMOD 2019.
Automated Database Indexing using Model-free Reinforcement Learning. Gabriel Paludo Licks, Felipe Meneguzzi. arXiv 2020.
DRLindex: deep reinforcement learning index advisor for a cluster database. Zahra Sadri, Le Gruenwald, Eleazar Lead. Symposium on International Database Engineering & Applications 2020.
Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Jan Kossmann, Stefan Halfpap, Marcel Jankrift, Rainer Schlosser. VLDB 2020.
An Index Advisor Using Deep Reinforcement Learning. Hai Lan, Zhifeng Bao, Yuwei Peng. CIKM 2020.
DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica. Gajic ICDE 2021.
MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning. Vishal Sharma, Curtis Dyreson. IDEAS 2021.
Indexer++: workload-aware online index tuning with transformers and reinforcement learning. Vishal Sharma, Curtis Dyreson. ACM SIGAPP SAC 2022.
AutoIndex: An Incremental Index Management System for Dynamic Workloads. Xuanhe Zhou, Luyang Liu, Wenbo Li, Lianyuan Jin, Shifu Li, Qingtian Wang, Jianhua Feng. ICDE 2022.

Configuration Tuning

SARD: A statistical approach for ranking database tuning parameters. Biplob K. Debnath, David J. Lilja, Mohamed F. Mokbel. ICDEW 2008.
Regularized Cost-Model Oblivious Database Tuning with Reinforcement Learning. Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, Stéphane Bressan. Springer 2016.
Automatic Database Management System Tuning Through Large-scale Machine Learning. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang. SIGMOD 2017.
The Case for Automatic Database Administration using Deep Reinforcement Learning. Ankur Sharma, Felix Martin Schuhknecht, Jens Dittrich. arXiv 2018.
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, Zekang Li. SIGMOD 2019.
External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. Andrew Pavlo, Matthew Butrovich,, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, Ruslan Salakhutdinov. IEEE 2019.
QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. Guoliang Li, Xuanhe Zhou, Shifu Li, Bo Gao. VLDB 2019.
Optimizing Databases by Learning Hidden Parameters of Solid State Drives. Aarati Kakaraparthy, Jignesh M. Patel, Kwanghyun Park, Brian P. Kroth. VLDB 2019.
iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases. Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, Rui Zhang. VLDB 2019.
Black or White? How to Develop an AutoTuner for Memory-based Analytics. Mayuresh Kunjir, Shivnath Babu. SIGMOD 2020.
Learning Efficient Parameter Server Synchronization Policies for Distributed SGD. Rong Zhu, Sheng Yang, Andreas Pfadler, Zhengping Qian, Jingren Zhou. ICLR 2020.
Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs. Konstantinos Kanellis, Ramnatthan Alagappan, Shivaram Venkataraman. HotStorage 2020.
Dynamic Configuration Tuning of Working Database Management Systems. Yoshiteru Ishihara, Masahito Shiba. LifeTech 2020.
Adaptive Multi-Model Reinforcement Learning for Online Database Tuning. Yaniv Gur, Dongsheng Yang, Frederik Stalschus, Berthold Reinwald. EDBT 2021.
An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems. Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino Bohan Zhang, Christian Bilien, Andrew Pavlo. VLDB 2021.
KEA: Tuning an Exabyte-Scale Data Infrastructure. Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino. SIGMOD 2021.
Predictive Price-Performance Optimization for Serverless Query Processing. Rathijit Sen, Abhishek Roy, Alekh Jindal. EDBT 2023.
AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft. Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao. VLDB 2020.
Towards Optimal Resource Allocation for Big Data Analytics. Anish Pimpley, Shuo Li, Rathijit Sen, Soundararajan Srinivasan, Alekh Jindal. EDBT 2022.

Cardinality / Selectivity Estimation

Cardinality Estimation Using Neural Networks. Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, Calisto Zuzarte. CASCON 2015.
Adaptive Cardinality Estimation. Oleg Ivanov, Sergey Bartunov. On arXiv (2017).
Learned Cardinalities: Estimating Correlated Joins with Deep Learning. Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, Alfons Kemper. CoRR 2018.
Towards a learning optimizer for shared clouds. Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. VLDB 2018.
Towards a Hands-Free Query Optimizer through Deep Learning. Ryan Marcus, Olga Papaemmanouil. CIDR 2019.
Neo: A Learned Query Optimizer. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, Nesime Tatbul. VLDB 2019.
Selectivity Estimation for Range Predicates using Lightweight Models. Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, Surajit Chaudhuri. VLDB 2019.
Cardinality Estimation with Local Deep Learning Models. Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, Wolfgang Lehner. aiDM 2019.
An empirical analysis of deep learning for cardinality estimation. Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi. arXiv 2019.
Deep Unsupervised Cardinality Estimation. Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, Ion Stoica. VLDB 2019.
Improved Cardinality Estimation by Learning Queries Containment Rates. Rojeh Hayek, Oded Shmueli. EDBT 2020.
Estimating Cardinalities with Deep Sketches. Andreas Kipf, Dimitri Vorona, Jonas Müller, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, Thomas Neumann, and Alfons Kemper. SIGMOD Demo 2019.
Estimating Filtered Group-By Queries is Hard: Deep Learning to the Rescue. Andreas Kipf, Michael Freitag, Dimitri Vorona, Peter Boncz, Thomas Neumann, Alfons Kemper. AIDB 2019.
Quicksel: Quick selectivity learning with mixture models. Yongjoo Park, Shucheng Zhong, Barzan Mozafari. SIGMOD 2020.
Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach. Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, Makoto Onizuka. SIGMOD 2020.
Cost-Guided Cardinality Estimation: Focus Where it Matters. Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh. SMDB 2020.
DeepDB: learn from data, not from queries!. Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, Carsten Binnig. VLDB 2020.
Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data. Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner. On arXiv (2020).
Best of Both Worlds: Combining Traditional and Machine Learning Models for Cardinality Estimation. Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner. aiDM 2020.
Are We Ready For Learned Cardinality Estimation?. Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, Qingqing Zhou. arXiv 2020.
Flow-Loss: Learning Cardinality Estimates That Matter. Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh. CoRR 2021.
A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. Peizhi Wu, Gao Cong. SIGMOD 2021.
LATEST: Learning-Assisted Selectivity Estimation Over Spatio-Textual Streams. Mayur Patil, Amr Magdy. ICDE 2021.
NeuroCard: One Cardinality Estimator for All Tables. Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, Ion Stoica. VLDB 2021.
Cardinality Estimation: Is Machine Learning a Silver Bullet?. Beibin Li, Yao Lu, Chi Wang, Srikanth Kandula. AIDB 2021.

Data-based Cardinality Estimation

Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation. Max Heimel, Martin Kiefer, and Volker Markl. SIGMOD 2015.
Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models. Martin Kiefer, Max Heimel, Sebastian Breß, and Volker Markl. VLDB 2017.
Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries. Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, and Gautam Das. SIGMOD 2020.
Learning to Sample: Counting with Complex Queries. Brett Walenz, Stavros Sintos, Sudeepa Roy, and Jun Yang. VLDB 2020.
Selectivity estimation using probabilistic models. Lise Getoor, Benjamin Taskar, and Daphne Koller. SIGMOD 2001.
Lightweight graphical models for selectivity estimation without independence assumptions. Kostas Tzoumas, Amol Deshpande, and Christian S. Jensen. VLDB 2011.
Efficiently adapting graphical models for selectivity estimation. Kostas Tzoumas, Amol Deshpande, and Christian S. Jensen. VLDB 2013.
An Approach Based on Bayesian Networks for Query Selectivity Estimation. Max Halford, Philippe Saint-Pierre, and Franck Morvan. DASFAA 2019.
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. arXiv 2020.
Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning. Suraj Shetiya, Saravanan Thirumuruganathan, Nick Koudas, and Gautam Das. VLDB 2021.
BayesCard: A Unified Bayesian Framework for Cardinality Estimation. Ziniu Wu, and Amir Shaikhha. arXiv 2020.
LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. Angjela Davitkova, Damjan Gjurovski, and Sebastian Michel. arXiv 2021.
LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data. Qiyu Liu, Yanyan Shen, and Lei Chen. ICDE 2021.

Query-based Cardinality Estimation

Adaptive selectivity estimation using query feedback. Chungmin Melvin Chen, and Nick Roussopoulos. SIGMOD 1994.
Selectivity Estimation in Extensible Databases - A Neural Network Approach. Seetha Lakshmi, and Shaoyu Zhou. VLDB 1998.
Effective query size estimation using neural networks. Hongjun Lu, and Rudy Setiono. Applied Intelligence 2002.
LEO - DB2's LEarning optimizer. Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. VLDB 2001.
A Black-Box Approach to Query Cardinality Estimation. Tanu Malik, Randal C. Burns, and Nitesh V. Chawla. CIDR 2007.
Learning State Representations for Query Optimization with Deep Reinforcement Learning. Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. DEEM@SIGMOD 2018.
Flexible Operator Embeddings via Deep Learning. Ryan Marcus, and Olga Papaemmanouil. arXiv 2019.
NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT. Rojeh Hayek, and Oded Shmueli. arXiv 2020.
Efficiently Approximating Selectivity Functions using Low Overhead Regression Models. Anshuman Dutt, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. VLDB 2020.
Learned Cardinality Estimation for Similarity Queries. Ji Sun, Guoliang Li, and Nan Tang. SIGMOD 2021.

Cost Estimation

Statistical learning techniques for costing XML queries. Ning Zhang, Peter J. Haas, Vanja Josifovski, Guy M. Lohman, Chun Zhang. VLDB 2005.
PQR: Predicting Query Execution Times for Autonomous Workload Management. Chetan Gupta, Abhay Mehta, Umeshwar Dayal. ICAC 2008.
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, David Patterson. ICDE 2009.
The Case for Predictive Database Systems: Opportunities and Challenges. Mert Akdere, Ugur Cetintemel, Matteo Riondato, Eli Upfal, Stan Zdonik. CIDR 2011.
Performance Prediction for Concurrent Database Workloads. Jennie Duggan, Ugur Cetintemel, Olga Papaemmanouil, Eli Upfal. SIGMOD 2011.
Predicting Completion Times of Batch Query Workloads Using Interaction-aware Models and Simulation. Mumtaz Ahmad, Songyun Duan, Ashraf Aboulnaga, Shivnath Babu. EDBT 2011.
Interaction-Aware Scheduling of Report Generation Workloads. Mumtaz Ahmad, Ashraf Aboulnaga, Shivnath Babu, Kamesh Munagala. VLDB 2011.
Learning-based Query Performance Modeling and Prediction. Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, Stanley B. Zdonik. ICDE 2012.
Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. Jiexing Li, Arnd Christian Konig, Vivek Narasayya, Surajit Chaudhuri. VLDB 2012.
Towards predicting query execution time for concurrent and dynamic database workloads. Wentao Wu, Yun Chi, Hakan Hacígümüş, Jeffrey F. Naughton. VLDB 2014.
Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction. Jennie Duggan, Olga Papaemmanouil, Ugur Cetintemel, Eli Upfal. EDBT 2014.
Plan-Structured Deep Neural Network Models for Query Performance Prediction. Ryan Marcus, Olga Papaemmanouil. VLDB 2019.
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le. SIGMOD 2020.
DBMS Fitting: Why should we learn what we already know?. Benjamin Hilprecht, Tiemo Bang, Muhammad El-Hindi, Benjamin Hättasch, Aditya Khanna, Robin Rehrmann, Uwe Röhm, Andreas Schmidt, Lasse Thostrup, Tobias Ziegler, Carsten Binnig. CIDR 2020.
A Note On Operator-Level Query Execution Cost Modeling. Wentao Wu. arXiv 2020.
Query Performance Prediction for Concurrent Queries using Graph Embedding. Xuanhe Zhou, Ji Sun, Guoliang Li, Jianhua Feng. VLDB 2020.
Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload. Johan Kok Zhi Kang, Gaurav, Sien Yi Tan, Feng Cheng, Shixuan Sun, Bingsheng He. arXiv 2021.

Query Optimization

Plan Selection Based on Query Clustering. Antara Ghosh, Jignashu Parikh, Vibhuti S. Sengar, Jayant R. Haritsa. VLDB 2002.
Cost-Based Query Optimization via AI Planning. Nathan Robinson, Sheila A. McIlraith, David Toman. AAAI 2014.
Sampling-Based Query Re-Optimization. Wentao Wu, Jeffrey F. Naughton, Harneet Singh. SIGMOD 2016.
Learning State Representations for Query Optimization with Deep Reinforcement Learning. Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi. DEEM 2018.
Learning to optimize join queries with deep reinforcement learning. Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, Ion Stoica. CoRR 2018.
Adaptive Optimization of Very Large Join Queries. Thomas Neumann, Bernhard Radke. SIGMOD 2018.
Deep Reinforcement Learning for Join Order Enumeration. Ryan Marcus, Olga Papaemmanouil. aiDM 2018.
SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis. VLDB 2018.
Neo: A Learned Query Optimizer. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, Nesime Tatbul. VLDB 2019.
An End-to-End Learning-based Cost Estimator. Ji Sun, Guoliang Li. VLDB 2019.
ML-based Cross-Platform Query Optimization. Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Bertty Contreras-Rojas, Rodrigo Pardo-Meza, Anis Troudi, Sanjay Chawla. ICDE 2020.
Reinforcement Learning with Tree-LSTM for Join Order Selection. Xiang Yu, Guoliang Li, Chengliang Chai, Nan Tang. ICDE 2020.
Research Challenges in Deep Reinforcement Learning-based Join Query Optimization. Runsheng Bengson. aiDM 2020.
Bao: Making Learned Query Optimizers Practical. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska. SIGMOD 2021.
Sia: Optimizing Queries using Learned Predicates. Qi Zhou, Joy Arulraj, Shamkant Navathe, William Harris, Jinpeng Wu. SIGMOD 2021.
Learning-based Declarative Query Optimization. Shivani Tripathi. CODS COMAD 2021.
Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft. Alekh Jindal, Shi Qiao, Rathijit Sen, Hiren Patel. ICDE 2021.
Steering Query Optimizers: A Practical Take on Big Data Workloads. Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal. SIGMOD 2021.
COMPASS: Online Sketch-based Query Optimization for In-Memory Databases. Yesdaulet Izenov, Asoke Datta, Florin Rusu, Jun Hyung Shin. SIGMOD 2021.
A Learned Query Rewrite System using Monte Carlo Tree Search. Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. VLDB 2022.
Deploying a Steered Query Optimizer in Production at Microsoft. Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc T. Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal. SIGMOD 2022.

Query Processing

Eddies: Continuously adaptive query processing. Ron Avnur, Joseph M. Hellerstein. SIGMOD 2000.
Micro adaptivity in Vectorwise. Bogdan Răducanu, Peter Boncz, Marcin Zukowski. SIGMOD 2013.
Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. Tomer Kaftan, Magdalena Balazinska, Alvin Cheung, Johannes Gehrke. arXiv 2018.
ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning. Fotis Savva, Christos Anagnostopoulos, Peter Triantafillou. ACM Symposium on Neural Gaze Detection 2018.
DBEST: Revisiting approximate query processing engines with machine learning models. Qingzhi Ma, Peter Triantafillou. SIGMOD 2019.
DeepSPACE: Approximate Geospatial Query Processing with Deep Learning. Dimitri Vorona, Andreas Kipf, Thomas Neumann, Alfons Kemper. SIGSPATIAL 2019.
LAQP: Learning-based Approximate Query Processing. Meifan Zhang, Hongzhi Wang. arXiv 2020.
Approximate Query Processing for Data Exploration using Deep Generative Models. Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, Gautam Das. ICDE 2020.
Approximate Query Processing for Group-By Queries based on Conditional Generative Models. Meifan Zhang, Hongzhi Wang. arXiv 2021.
Learned Approximate Query Processing: Make it Light, Accurate and Fast. Qingzhi Ma, Ali M. Shanghooshabad, Mehrdad Almasi, Meghdad Kurmanji, Peter Triantafillou. CIDR 2021.
PGMJoins: Random Join Sampling with Graphical Models. Ali Mohammadi Shanghooshabad, Meghdad Kurmanji, Qingzhi Ma, Michael Shekelyan, Mehrdad Almasi, and Peter Triantafillou. SIGMOD 2021.
XLJoins. Ali Mohammadi Shanghooshabad. SIGMOD 2021 SRC.
Bandit Join: Preliminary Results. Vahid Ghadakchi, Mian Xie, Arash Termehchy. aiDM 2020.
Conditional Generative Model based Predicate-Aware Query Approximation. Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, Vikas Maddukuri. AAAI 2022.
Phoebe: A Learning-based Checkpoint Optimizer. Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal. VLDB 2021.

Scheduling

Resource Management with Deep Reinforcement Learning. Hongzi Mao, Mohammad Alizadeh, Ishai Menache, Srikanth Kandula. HotNets 2016.
WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases. Ryan Marcus, Olga Papaemmanouil. VLDB 2016.
Model-free Control for Distributed Stream Data Processing using Deep Reinforcement Learning. Teng Li, Zhiyuan Xu, Jian Tang, Yanzhi Wang. VLDB 2018.
Learning Scheduling Algorithms for Data Processing Clusters. Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh. SIGCOMM 2019.
Scheduling OLTP Transactions via Learned Abort Prediction. Yangjun Sheng, Anthony Tomasic, Tieying Zhang, Andrew Pavlo. aiDM 2019.
Scheduling OLTP Transactions via Machine Learning. Yangjun Sheng, Anthony Tomasic, Tieying Zhang, Andrew Pavlo. arXiv 2019.
CrocodileDB: Efficient Database Execution through Intelligent Deferment. Zechao Shang, Xi Liang, Dixin Tang, Cong Ding, Aaron J. Elmore, Sanjay Krishnan, Michael J. Franklin. CIDT 2020.
Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. Chi Zhang, Ryan Marcus, Anat Kleiman, Olga Papaemmanouil. arXiv 2020.
Polyjuice: High-Performance Transactions via Learned Concurrency Control. Jiachen Wang, Ding Ding, Huan Wang, Conrad Christensen, Zhaoguo Wang, Haibo Chen, Jinyang Li. arXiv 2021.
Scheduling of Time-Varying Workloads Using Reinforcement Learning. Shanka Subhra Mondal, Nikhil Sheoran, Subrata Mitra. AAAI 2021.
LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems. Ibrahim Sabek, Tenzin Samten Ukyab, Tim Kraska. SIGMOD 2022.

Caching

Competitive Caching with Machine Learned Advice. Thodoris Lykouris, Sergei Vassilvitskii. ICML 2018.
Learning Caching Policies with Subsampling. Haonan Wang, Hao He, Mohammad Alizadeh, Hongzi Mao. NeurIPS 2019.
RL-Cache: Learning-Based Cache Admission for Content Delivery. Vadim Kirilin; Aditya Sundarrajan; Sergey Gorinsky; Ramesh K. Sitaraman. IEEE 2020.
Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines. Lei Yang, Hong Wu, Tieying Zhang, Xuntao Cheng, Feifei Li, Lei Zou, Yujie Wang, Rongyao Chen, Jianying Wang, Gui Huang. VLDB 2020.

Sorting

The Case for a Learned Sorting Algorithm. Ani Kristo, Kapil Vaidya, Ugur Çetintemel, Sanchit Misra, Tim Kraska. SIGMOD 2020.
Defeating duplicates: A re-design of the LearnedSort algorithm. Ani Kristo, Kapil Vaidya, Tim Kraska. AIDB 2021.
Centroid Sort: a Clustering-based Technique for Accelerating Sorting Algorithms. Peter Olukanmi, Peter Popoola, Michael Olusanya. IMITEC 2020.

Garbage Collection

To Collect or Not to Collect? Machine Learning for Memory Management. Eva Andreasson, Frank Hoffmann, Olof Lindholm. JVM 2002.
Garbage Collection Auto-Tuning for Java MapReduce on Multi-Cores. Jeremy Singer, George Kovoor, Gavin Brown, Mikel Luján. ACM SIGPLAN Notices 2011.
A method for reducing garbage collection overhead of SSD using machine learning algorithms. Jung Kyu Park, Jaeho Kim. ICTC 2017.
Reducing Garbage Collection Overhead in SSD Based on Workload Prediction. Pan Yang, Ni Xue, Yuqi Zhang, Yangxu Zhou, Li Sun, Wenwen Chen, Zhonggang Chen, Wei Xia, Junke Li, Kihyoun Kwon. HotStorage 2019.
Optimal Choice of When to Garbage Collect. Nicholas Jacek, Meng-Chieh Chiu, Benjamin M. Marlin, J. Eliot B. Moss. ACM Transactions on Programming Languages and Systems 2019.
Learning When to Garbage Collect with Random Forests. Nicholas Jacek, J. Eliot B. Moss. ISMM 2019.
Learned Garbage Collection. Lujing Cen, Ryan Marcus, Hongzi Mao, Justin Gottschlich, Mohammad Alizadeh, Tim Kraska. MAPL 2020.

Sketches

Learning-Based Frequency Estimation Algorithms. Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Ali Vakilian. ICLR 2019.
Composable Sketches for Functions of Frequencies: Beyond the Worst Case. Edith Cohen, Ofir Geri, Rasmus Pagh. ICML 2020.

Compilation / Compilers

Compiler Auto-Vectorization with Imitation Learning. Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, Michael Carbin. NeurIPS 2019.
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks. Charith Mendis, Alex Renda, Saman Amarasinghe, Michael Carbin. ICML 2019.
A Learned Performance Model for Tensor Processing Units. Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows. MLSys 2021.
Learning to Optimize Tensor Programs. Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. NeurIPS 2018.
Learning to optimize halide with tree search and random programs. Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michael Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Fredo Durand, Jonathan Ragan-Kelley. ACM Transactions on Graphics Vol. 38, No. 4, Article 121, 2019.

SQL-Related

SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning. Xiaojun Xu, Chang Liu, Dawn Song. arXiv 2017.
Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. Shrainik Jain, Bill Howe, Jiaqi Yan, Thierry Cruanes. VLDB 2018.
Bootstrapping an End-to-End Natural Language Interface for Databases. Nathaniel Weir, Prasetya Utama. SIGMOD 2019.
Facilitating SQL Query Composition and Analysis. Zainab Zolaktaf, Mostafa Milani, Rachel Pottinger. arXiv 2020.
Natural language to SQL: Where are we today?. Hyeonji Kim, Byeong-Hoon So, Wook-Shin Han, Hongrae Lee. VLDB 2020.
From Natural Language Processing to Neural Databases. James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy. VLDB 2021.
BERT Meets Relational DB: Contextual Representations of Relational Databases. Siddhant Arora, Vinayak Gupta, Garima Gaur, Srikanta Bedathur. arXiv 2021.

Workload Models for Autonomic Database Management Systems. Patrick Martin, Said Elnaffar, and Ted Wasserman. International Conference on Autonomic and Autonomous Systems 2006.
Towards workload shift detection and prediction for autonomic databases. Marc Holze, and Norbert Ritter. CIKM 2007.
Query-based Workload Forecasting for Self-Driving Database Management Systems. Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J. Gordon. SIGMOD 2018.
Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases. Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, Feifei Li, Changcheng Chen, and Dan Pei. VLDB 2020.
Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation. Olga Poppe, Tayo Amuneke, Dalitso Banda, Aritra De, Ari Green, Manon Knoertzer, Ehi Nosakhare, Karthik Rajendran, Deepak Shankargouda, Meina Wang, Alan Au, Carlo Curino, Qun Guo, Alekh Jindal, Ajay Kalhan, Morgan Oslake, Sonia Parchani, Vijay Ramani, Raj Sellappan, Saikat Sen, Sheetal Shrotri, Soundararajan Srinivasan, Ping Xia, Shize Xu, Alicia Yang, Yiwen Zhu. VLDB 2020.
Database Workload Characterization with Query Plan Encoders. Debjyoti Paul, Jie Cao, Feifei Li, and Vivek Srikumar. arXiv 2021.
Workload-Aware Performance Tuning for Autonomous DBMSs. Zhengtong Yan, Jiaheng Lu, Naresh Chainani, and Chunbin Lin. ICDE 2021.
FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. OSDI 2020.
Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices. Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. ASPLOS 2019.
Sage: Practical and Scalable ML-Driven Performance Debugging in Microservices. Yu Gan, Mingyu Liang, Sundar Dev, David Lo, and Christina Delimitrou. ASPLOS 2021.
Sinan: ML-Based and QoS-Aware Resource Management for Cloud Microservices. Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G. Edward Suh, and Christina Delimitrou. ASPLOS 2021.

Data Cleaning and Exploration

Finding Label and Model Errors in Perception Data With Learned Observation Assertions. Daniel Kang, Nikos Arechiga, Sudeep Pillai, Peter D Bailis, Matei Zaharia. AIDB 2021.
ASET: Ad-hoc Structured Exploration of Text Collections. Benjamin Hättasch, Jan-Micha Rainer Bodensohn, Carsten Binnig. AIDB 2021.