Publications

You can also find my articles on my Google Scholar profile.

Styx in Action: Transactional Cloud Applications Made Easy

Published in 51st International Conference on Very Large Data Bases, 2025

Developing and deploying transactional cloud applications such as banking and e-commerce systems is a daunting task for developers. The reason for this difficulty is twofold. First, developing such applications shifts the developers’ focus from the application logic to considerations of distributed transactions, fault-tolerance, consistency, and scalability. Second, deploying such applications involves multiple systems, such as databases, load balancers, or containerized services, impeding efficient resource management. This demonstration presents Styx, a scalable application runtime that allows developers to build scalable and transactional cloud applications with minimal effort. It supports serializability and exactly-once guarantees and focuses on the ease of development and deployment, as well as Styx’s fault-tolerance mechanisms.

Citation: Psarakis, Kyriakos, et al. "Styx in Action: Transactional Cloud Applications Made Easy." 51st International Conference on Very Large Data Bases (VLDB 2025). 2025.
Download Paper

Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement

Published in 2024 USENIX Annual Technical Conference (USENIX ATC 24), 2024

Input data preprocessing is a common bottleneck in machine learning (ML) jobs, that can significantly increase training time and cost as expensive GPUs or TPUs idle waiting for input data. Previous work has shown that offloading data preprocessing to remote CPU servers successfully alleviates data stalls and improves training time. However, remote CPU workers in disaggregated data processing systems comprise a significant fraction of total training costs. Meanwhile, current disaggregated solutions often underutilize CPU and DRAM resources available on ML accelerator nodes. We propose two approaches to alleviate ML input data stalls while minimizing costs. First, we dynamically schedule data preprocessing workers on ML accelerator host resources to minimize the number of remote CPU workers needed to achieve peak data ingestion bandwidth. Second, we analyze the characteristics of input pipelines and automatically reorder transformations to increase data preprocessing worker throughput. We observe that relaxing commutativity increases throughput while maintaining high model accuracy for a variety of ML data pipelines. We build Pecan, an ML data preprocessing service that automates data preprocessing worker placement and transformation reordering decisions. Pecan reduces preprocessing costs by 87% on average and total training costs by up to 60% compared to training with state-of-the-art disaggregated data preprocessing and total training costs by 55% on average compared to collocated data preprocessing.

Citation: Graur, Dan, et al. "Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement." 2024 USENIX Annual Technical Conference (USENIX ATC 24). 2024.
Download Paper

Enhanced prediction of vegetation responses to extreme drought using deep learning and Earth observation data

Published in Ecological Informatics, 2024

The advent of abundant Earth observation data enables the development of novel predictive methods for forecasting climate impacts on the state and health of terrestrial ecosystems. Here, we predict the spatial and temporal variations of land surface reflectance and vegetation greenness, measuring the density of green vegetation and active foliage area, conditioned on current and past weather and the local topography. We train two alternative recurrent deep learning models that combine Long Short-Term Memory cells with convolutional layers (ConvLSTM) for forecasting the spatially resolved deviation of surface reflectance across a heterogeneous landscape from a specified initial state. Using data from diverse ecosystems and land cover types across Europe and following a standardized model evaluation framework (EarthNet2021 Challenge), our results indicate increased performance in predicting surface greenness during extreme drought events of the models presented here, compared to currently published benchmarks. This demonstrates how deep learning methods for optical Earth observation time series enable an early-warning of vegetation responses to the impacts of climatic extreme events, such as the drought-related loss of green foliage.

Citation: Kladny, Klaus-Rudolf, et al. "Enhanced prediction of vegetation responses to extreme drought using deep learning and Earth observation data." Ecological Informatics 80 (2024): 102474.
Download Paper

Explore, Compare, and Predict Investment Opportunities through What-If Analysis: US Housing Market Investigation

Published in Proceedings of the 16th International Symposium on Visual Information Communication and Interaction, 2023

A key challenge in data analysis tools for domain-specific applications with high-dimensional time series data is to provide an intuitive way for users to explore their datasets, analyze trends and understand the models developed for these applications through human-computer interaction. To address this challenge, we propose a three-stage workflow that allows domain experts to explore their data, compare the different entities’ features, and predict the variable’s long-term trend using what-if analyses. Based on this workflow, we created a data visualization workspace for real estate investment using data from the US housing market at state and city level. The underlying machine learning model ARIMAX uses house price data together with socio-economic data from 2000 to 2021 to learn the dependencies of the house prices on the socio-economic factors and make informative and robust predictions for future years.

Citation: Chen, Hongruyu, et al. "Explore, Compare, and Predict Investment Opportunities through What-If Analysis: US Housing Market Investigation." Proceedings of the 16th International Symposium on Visual Information Communication and Interaction. 2023.
Download Paper

Local Search for AI Planning

Published in IPS-RCRA@ AI* IA, 2020

Recent breakthroughs in the field of AI planning such as the Identidem and Marvin planners support the creation of more advanced and realistic representations of real-world domains. It is well-known that an adequate local search strategy can help to solve increasingly complicated Planning Domain Definition Language problems. Contemporary planners, however, strive to find a balance between the traditional greedy search and a certain degree of randomness. The aim of this work is thus to introduce a new planner that combines applicable local search techniques in a novel way not explored before to enhance the performance of the existing JavaFF planner. The new proposed planner is based on the principle of local beam search combining different successor selection methods, macros and restarts. Experimental results show that the new planner can solve considerably more problems and often within a shorter time compared to its predecessor JavaFF. Our planner could find its practical utilization in domains such as urban traffic modelling or autonomous robot control.

Citation: Mraz, Oto. "Local Search for AI Planning." IPS-RCRA@ AI* IA. 2020.
Download Paper

Oto Mraz