ShockLab Seminar Series - Thomas J Ringstrom
The next Shocklab Seminar will be online on Wednesday, 15 October 2025, at 16:00 PM. Thomas J Ringstrom will be presenting "A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes". Please come along if you can, it’s sure to be an engaging talk!
Title: A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes
Speaker: Thomas J Ringstrom
Date: Wednesday, 15 October 2025
Time: 16:00-17:00 (GMT +2)
Zoom Meeting Link: https://uct-za.zoom.us/j/92750361177?pwd=QzNiRzBJRjRITVlwa2k5SVNkVmx5UT09
Abstract:In this talk, I present Option Kernel Bellman Equations (OKBEs) for a new reward-free Markov Decision Process. Rather than a value function, OKBEs directly construct and optimize a predictive map called a state-time option kernel (STOK) to maximize the probability of completing a goal while avoiding constraint violations. STOKs are compositional, modular, and interpretable initiation-to-termination transition kernels for policies in the Options Framework of Reinforcement Learning. This means: 1) STOKs can be composed using Chapman-Kolmogorov equations to make spatiotemporal predictions for multiple policies over long horizons, 2) high-dimensional STOKs can be represented and computed efficiently in a factorized and reconfigurable form, and 3) STOKs record the probabilities of semantically interpretable goal-success and constraint-violation events, needed for formal verification. Given a high-dimensional state-transition model for an intractable planning problem, one can decompose it with local STOKs and goal-conditioned policies that are aggregated into a factorized goal kernel, making it possible to forward-plan at the level of goals in high-dimensions to solve the problem. These properties lead to highly flexible agents that can rapidly synthesize meta-policies, reuse planning representations across many tasks, and justify goals using empowerment, an intrinsic motivation function. I argue that reward-maximization is in conflict with the properties of compositionality, modularity, and interpretability. Alternatively, OKBEs facilitate these properties to support verifiable long-horizon planning and intrinsic motivation that scales to dynamic high-dimensional world-models.
Bio: I develop theory for non-stationary non-Markovian (time- and history-dependent) planning and intrinsic motivation in order to understand autonomous agency. My interests are in the algorithmic and representational principles that allow for high-dimensional compositional planning and goal setting that do not rely on reward functions or reward-maximization objectives.
See you there!
Housekeeping: