Mathematical Statistics and Data Science
The research group in Mathematical Statistics and Data Science studies advanced methods and models for analysing and representing data. We employ probability theory and stochastic processes to rigorously model uncertainty and randomness, and abstract and linear algebra to understand the structure of statistical models and the relationships between their parameters.
News and events
Join
stochastics@list.aalto.fi to stay updated on probability and statistics in Aalto University.
Join
stochastics-finland@list.aalto.fi for announcements on probability and statistics in Finland.
Members
 |
Pauliina Ilmonen Professor Multivariate extreme values, functional data analysis, cancer epidemiology |
 |
Lasse Leskelä Professor Mathematical statistics, probability theory, network analysis |
 |
Kaie Kubjas Associate Professor Algebraic statistics |
 |
Vanni Noferini Associate Professor Network analysis, random matrix theory |
 |
Jukka Kohonen University Lecturer Statistics, combinatorics |
 |
Pekka Pere University Lecturer Statistics |
 |
Jonas Tölle Senior University Lecturer Stochastic processes, probability theory |
Publications
Individual publication records and links to full articles when available can be found on the
Aalto research page, where you can also find an overview of
research output for the Mathematical Statistics and Data Science area.
Selected publications
-
- F Arrigo, DJ Higham, V Noferini, R Wood. Weighted enumeration of nonbacktracking walks on weighted graphs. SIAM Journal on Matrix Analysis and Applications 2024.
- M Bloznelis, L Leskelä. Clustering and percolation on superpositions of Bernoulli random graphs. Random Structures & Algorithms 2023.
- M Hinz, JM Tölle and L Viitasaari. Variability of paths and differential equations with BV-coefficients. Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 2023.
- A Belyaeva, K Kubjas, LJ Sun, C Uhler. Identifying 3D genome organization in diploid organisms via Euclidean distance geometry. SIAM Journal on Mathematics of Data Science 2022.
- J Alho, E Arjas, J Karvanen, L Leskelä, E Läärä ja P Pere. Tilastotieteen sanasto. Suomen Tilastoseura 2023.
Teaching
We teach courses in probability and statistics at all levels. Some of the offered courses are eligible as a basis for an
SHV degree in insurance mathematics. Doctoral education in probability and statistics is coordinated by the
Finnish Doctoral Education Network in Stochastics and Statistics (FDNSS).
Seminars
Upcoming seminars
- 27.4. 15:15 MSc Ian Välimaa (Aalto): Mid-term review: Consistent clustering in tensors block models – Y405
- 5.5. 10:15 BSc Kerkko Konola: Latent efficient price recovery in ETFs: A simulation study (MSc presentation) – M3 (M234)
Recovering the latent efficient price from limit order book data is a fundamental challenge in econometrics, since the price that price discovery aims to reveal is never directly observed. This thesis studies whether statistical methods can reliably recover the latent price, and under what conditions one approach outperforms another. The analysis is carried out in a Monte Carlo simulation framework in which the true efficient price is known by construction, allowing a direct comparison of estimation accuracy.
The data generating process incorporates the empirically relevant features of market microstructure. A driftless Brownian motion drives the latent price, order flow follows a persistent AR(1) process with linear price impact, and microstructure noise is state-dependent and heteroskedastic. Two estimators are compared across three microstructure regimes: a misspecified linear Kalman filter and a nonlinear XGBoost model. The Kalman filter performs better when distortions are mild, reacting to directional price changes roughly 0.3 seconds faster, whereas XGBoost achieves up to 48% lower mean squared error in the high-noise regime by capturing nonlinear order flow patterns the filter cannot represent.
The framework is then extended to a multi-ETF setting in which three funds with different liquidity profiles track the same underlying index. Cross-asset information improves XGBoost uniformly across all ETFs, while the Kalman filter responds asymmetrically, improving for the less liquid ETFs but deteriorating for the most liquid one. The two estimators converge to near-identical accuracy once cross-asset information is available.
Taken together, the results suggest that efficient price recoverability is regime- and specification-dependent, with implications for estimator selection and the design of cross-ETF statistical arbitrage strategies.
Projects and networks
Page content by: webmaster-math [at] list [dot] aalto [dot] fi