Posts by Collection

publications

Local Search for AI Planning

Published in IPS-RCRA@ AI* IA, 2020

Recent breakthroughs in the field of AI planning such as the Identidem and Marvin planners support the creation of more advanced and realistic representations of real-world domains. It is well-known that an adequate local search strategy can help to solve increasingly complicated Planning Domain Definition Language problems. Contemporary planners, however, strive to find a balance between the traditional greedy search and a certain degree of randomness. The aim of this work is thus to introduce a new planner that combines applicable local search techniques in a novel way not explored before to enhance the performance of the existing JavaFF planner. The new proposed planner is based on the principle of local beam search combining different successor selection methods, macros and restarts. Experimental results show that the new planner can solve considerably more problems and often within a shorter time compared to its predecessor JavaFF. Our planner could find its practical utilization in domains such as urban traffic modelling or autonomous robot control.

Citation: Mraz, Oto. "Local Search for AI Planning." IPS-RCRA@ AI* IA. 2020.
Download Paper

Explore, Compare, and Predict Investment Opportunities through What-If Analysis: US Housing Market Investigation

Published in Proceedings of the 16th International Symposium on Visual Information Communication and Interaction, 2023

A key challenge in data analysis tools for domain-specific applications with high-dimensional time series data is to provide an intuitive way for users to explore their datasets, analyze trends and understand the models developed for these applications through human-computer interaction. To address this challenge, we propose a three-stage workflow that allows domain experts to explore their data, compare the different entities’ features, and predict the variable’s long-term trend using what-if analyses. Based on this workflow, we created a data visualization workspace for real estate investment using data from the US housing market at state and city level. The underlying machine learning model ARIMAX uses house price data together with socio-economic data from 2000 to 2021 to learn the dependencies of the house prices on the socio-economic factors and make informative and robust predictions for future years.

Citation: Chen, Hongruyu, et al. "Explore, Compare, and Predict Investment Opportunities through What-If Analysis: US Housing Market Investigation." Proceedings of the 16th International Symposium on Visual Information Communication and Interaction. 2023.
Download Paper

Enhanced prediction of vegetation responses to extreme drought using deep learning and Earth observation data

Published in Ecological Informatics, 2024

The advent of abundant Earth observation data enables the development of novel predictive methods for forecasting climate impacts on the state and health of terrestrial ecosystems. Here, we predict the spatial and temporal variations of land surface reflectance and vegetation greenness, measuring the density of green vegetation and active foliage area, conditioned on current and past weather and the local topography. We train two alternative recurrent deep learning models that combine Long Short-Term Memory cells with convolutional layers (ConvLSTM) for forecasting the spatially resolved deviation of surface reflectance across a heterogeneous landscape from a specified initial state. Using data from diverse ecosystems and land cover types across Europe and following a standardized model evaluation framework (EarthNet2021 Challenge), our results indicate increased performance in predicting surface greenness during extreme drought events of the models presented here, compared to currently published benchmarks. This demonstrates how deep learning methods for optical Earth observation time series enable an early-warning of vegetation responses to the impacts of climatic extreme events, such as the drought-related loss of green foliage.

Citation: Kladny, Klaus-Rudolf, et al. "Enhanced prediction of vegetation responses to extreme drought using deep learning and Earth observation data." Ecological Informatics 80 (2024): 102474.
Download Paper

Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement

Published in 2024 USENIX Annual Technical Conference (USENIX ATC 24), 2024

Input data preprocessing is a common bottleneck in machine learning (ML) jobs, that can significantly increase training time and cost as expensive GPUs or TPUs idle waiting for input data. Previous work has shown that offloading data preprocessing to remote CPU servers successfully alleviates data stalls and improves training time. However, remote CPU workers in disaggregated data processing systems comprise a significant fraction of total training costs. Meanwhile, current disaggregated solutions often underutilize CPU and DRAM resources available on ML accelerator nodes. We propose two approaches to alleviate ML input data stalls while minimizing costs. First, we dynamically schedule data preprocessing workers on ML accelerator host resources to minimize the number of remote CPU workers needed to achieve peak data ingestion bandwidth. Second, we analyze the characteristics of input pipelines and automatically reorder transformations to increase data preprocessing worker throughput. We observe that relaxing commutativity increases throughput while maintaining high model accuracy for a variety of ML data pipelines. We build Pecan, an ML data preprocessing service that automates data preprocessing worker placement and transformation reordering decisions. Pecan reduces preprocessing costs by 87% on average and total training costs by up to 60% compared to training with state-of-the-art disaggregated data preprocessing and total training costs by 55% on average compared to collocated data preprocessing.

Citation: Graur, Dan, et al. "Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement." 2024 USENIX Annual Technical Conference (USENIX ATC 24). 2024.
Download Paper

Styx in Action: Transactional Cloud Applications Made Easy

Published in 51st International Conference on Very Large Data Bases, 2025

Developing and deploying transactional cloud applications such as banking and e-commerce systems is a daunting task for developers. The reason for this difficulty is twofold. First, developing such applications shifts the developers’ focus from the application logic to considerations of distributed transactions, fault-tolerance, consistency, and scalability. Second, deploying such applications involves multiple systems, such as databases, load balancers, or containerized services, impeding efficient resource management. This demonstration presents Styx, a scalable application runtime that allows developers to build scalable and transactional cloud applications with minimal effort. It supports serializability and exactly-once guarantees and focuses on the ease of development and deployment, as well as Styx’s fault-tolerance mechanisms.

Citation: Psarakis, Kyriakos, et al. "Styx in Action: Transactional Cloud Applications Made Easy." 51st International Conference on Very Large Data Bases (VLDB 2025). 2025.
Download Paper

The Missing Dimensions in Geo-Distributed Database Evaluation

Published in ArXiv, 2026

Geo-distributed OLTP databases are widely deployed across cloud regions, yet current evaluation practices do not cover the challenges of this aspect. Existing benchmarks assume stable network conditions; they lack explicit settings for data and client locality, and they largely ignore data transfer costs across regions. In addition, most evaluations rely on a limited set of geo-distribution patterns. In this paper, we propose Gaia, a comprehensive evaluation framework that addresses these gaps. We use Gaia to perform a comprehensive evaluation of existing geo-distributed OLTP systems. We deploy them across multiple cloud regions, using different geo-distribution patterns and variable cross-region network conditions. Among other interesting findings, our framework reveals that: i) most systems are sensitive to network instabilities, ii) network costs dominate cloud deployment expenses iii) multi-region fault-tolerance mechanisms incur measurable critical-path overhead that is often overlooked in prior evaluations. We argue that for the design of future geo-distributed databases, we must rethink the trade-offs between performance, fault-tolerance, and cost.

Citation: Mraz, Oto, et al. "The Missing Dimensions in Geo-Distributed Database Evaluation." arXiv preprint arXiv:2605.30156 (2026).
Download Paper

supervision

MSc Thesis: Autoscaling for Transactional Stateful Functions

MSc Thesis, TU Delft, EEMCS Faculty, 2025

In this project I supervised a MSc student (Dorian Erhan) in his thesis on autoscaling transactional stateful functions. The student will profile the performance of Styx under varying user demand and resource supply, devise policies for automatically scaling Styx for optimal performance, and evaluate the performance of the policies in practice.

MSc Thesis: Efficient Remastering for Geo-Distributed Databases

MSc Thesis, TU Delft, EEMCS Faculty, 2025

In this project I supervised a MSc student (Emiel de Graaf) in his thesis on Efficient Remastering for Geo-Distributed Databases. The student will profile exsiting data remastering policies and their impact on transactional throughput, latency and monetary costs. He will then implement new data remastering policy to adapt to dynamic and shifting transactional workloads, e.g., diurnal cycles.

MSc Thesis: Transactional guarantees for Agentic AI Systems

MSc Thesis, TU Delft, EEMCS Faculty, 2026

In this project I will supervise a MSc student (Daniel Rachev) in his thesis on transactional guarantees for multi-agent systems. The student will investigate how we can provide agentic systems with favorable guarantees such as atomicity, consistency, revesibility, etc. while maintaining their high performance.

MSc Thesis: Dynamic Data Remastering

MSc Thesis, TU Delft, EEMCS Faculty, 2026

In this project I will supervise a MSc student (Zofia Rogacka) in her thesis on dynamic data remastering. The student will build on earlier work on data remastering, this time focusing on designing a policy that will work effectively also for a cluster consisting of several regions, and be able to work around data privacy restrictions effectively.

talks

teaching

Practical Experiences of Programming

Undergraduate course, King's College London, Department of Informatics, 2019

This course aims to provide the students with extensive practical experience of programming; to draw on, integrate, and build upon the theoretical and practical teaching of other modules in the programme.

Cloud Computing Architecture

Graduate course, ETH Zurich, Department of Computer Science, 2022

The course covers topics including server design, cluster management, large-scale storage systems, serverless computing, data analytics frameworks, and performance analysis.