Thiago D. Simão

Office MF 4.120

MetaForum

I am an Assistant Professor in the Department of Mathematics and Computer Science at TU/e. Previously, I was a Ph.D. candidate in the Algorithmics Group at Delft University of Technology, advised by Dr. Matthijs Spaan. Next, I was a PostDoc researcher with the Department of Software Science (SWS) at Radboud University Nijmegen advised by Dr. Nils Jansen. For more details, checkout my biography or my cv .

Research Interests: The motivation for my research revolves around making AI techniques more reliable, to enable their deployment in real-world applications. I focus on developing AI algorithms for scenarios with constrained interactions with an unknown environment. I am currently interested in safe reinforcement learning, a research topic concerned with problems where a minimum performance must be guaranteed and catastrophic events must be avoided.

Academic Service:

Organization committee of the BeNeRL Workshop 2018.
Local organizing committee of the 28th ICAPS.
PC for NeurIPS22, ICML22, ICAPS22, AAAI21.
Reviewer for JAAMAS, ICRA, AAAI and BRACIS.

Besides my professional activities, I like to run, play boardgames, listen to music and read.

news

2023

December

Our papers “Robust Active Measuring under Model Uncertainty” and “Factored Online Planning in Many-Agent POMDPs” have been accepted at AAAI-24.

October

New job! I am now an assistant professor in the Data and AI cluster at Eindhoven University of Technology.

September

The ORLEANS project on Offline Reinforcement Learning for Sustainable Transportation at Sea has received an IPR voucher.

September

I am serving as a SPC member for AAMAS-24.

September

I am serving as a PC member for AAAI-24.

April

Presenting our work on SPI in factored environments at the TiCSA 2023 workshop.

April

Invited talk at the LiVe 2023 workshop.

March

I am serving as a PC member for NeurIPS-23.

February

Our paper “Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring” has been accepted at ICAPS-23.

February

I am serving as a PC member for ICML 2023.

January

Our paper “Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation” has been accepted at ICLR-23.

January

I successfully defended my PhD thesis. A big thanks to my promotor team and the thesis committee.

2022

December

Invited to teach three lectures in the Reinforcement Learning course at University of Verona.

December

Our paper “Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking” has been accepted at ICAART-23.

November

Our paper “Safe Policy Improvement for POMDPs via Finite-State Controllers” has been accepted at AAAI-23.

November

Two talks at the AAAI 2022 Fall Symposium.

October

I am serving as a PC member for AISTATS 2023.

September

Our paper “Robust Anytime Learning of Markov Decision Processes” has been accepted at NeurIPS-22.

August

I am serving as a PC member for ICAPS 2023.

July

I am serving as a PC member for NeurIPS 2022.

June

Our paper “Safety-constrained reinforcement learning with a distributional safety critic” has been published at Machine Learning.

May

Two papers presented at the ALA 2022 workshop on Safe Transfer in RL and Solving Hidden Parameter MDPs with Hindsight.

April

Invited talk for the Oden Institute seminar at UT Austin.

April

Talk at the LiVe-22 workshop about Safe Transfer in Reinforcement Learning.

March

Talk at the ADML meetup about Ensuring Safety for Reinforcement Learning.

January

I am serving as a PC member for ICML 2022.

2021

December

Talk at the iVerif workshop on Safety Abstractions.

October

I am serving as a PC member for the Planning and Learning track at ICAPS 2022.

August

Talk at the PRL workshop.

August

At ICAPS-21 attending the mentoring program.

June

Invited talk at the Center for Artificial Intelligence.

May

At AAMAS-21 presenting the AlwaysSafe paper.

March

Talk at the LiVe-21 workshop about AlwaysSafe.

March

Guest lecture on Safe RL at the Algorithms for Intelligent Decision Making course.

February

Invited talk at the SWS-seminar about our AAMAS paper.

2020

December

Our paper “AlwaysSafe: Reinforcement Learning Without Safety Constraint Violations During Training” has been accepted at AAMAS-21.

December

Our paper “WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning” has been accepted at AAAI-21.

September

I am serving as a PC member for AAAI-21.

May

At AAMAS-20 presenting the paper “Safe Policy Improvement with an Estimated Baseline Policy.”

May

Released gym-factored, a collection of factored environments that are OpenAI Gym compliant.

2019

August

At IJCAI-19 presenting our paper on structure learning for safe RL.

August

At IJCAI-19 participating on the doctoral consortium .

May

Attending the conference RLDM-19.

May

Starting my interniship at MSR Montreal with Romain Laroche and Remi Tachet des Combes.

May

I got the prize for Best Poster in our department’s poster session.

March

In Hilversum, presenting our work on reinforcement learning at the ICT.Open-19.

January

At AAAI-19 presenting our paper on safe policy improvement in factored environments.

2018

November

I am co-organizing the Belgium Netherlands Workshop on Reinforcement Learning (BeNeRL-18).

October

I am attending the 14th European Workshop on Reinforcement Learning (EWRL-18).

July

I gave a contributed talk at the ICML-18 Workshop on Planning and Learning.

June

I presented a poster at ICAPS-18.

June

I am helping the local organizing committee of the ICAPS-18 at Delft.

June

Attending the ICAPS-18 summer school at Noordwijk.

2017

November

I presented a poster at the Energy Event promoted by the PowerWeb Institute.

October

Presenting a poster at the EEMCS’s PhD Event.

October

I attended the ACAI Summer School on Reinforcement Learning.

August

I attended the 19th European Agent Systems Summer School.

selected publications

AAAI
Safe Policy Improvement for POMDPs via Finite-State Controllers

Simão, Thiago D., Suilen, Marnix, and Jansen, Nils

In Proceedings of the AAAI Conference on Artificial Intelligence 2023

Abs arXiv Bib HTML PDF Code Details

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy-space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data was available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.
@inproceedings{Simao2023safe, title = {Safe Policy Improvement for POMDPs via Finite-State Controllers}, author = {Sim{\~a}o, Thiago D. and Suilen, Marnix and Jansen, Nils}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, year = {2023}, publisher = {{AAAI} Press}, pages = {15109--15117} }
AAMAS
AlwaysSafe: Reinforcement Learning Without Safety Constraint Violations During Training

Simão, Thiago D., Jansen, Nils, and Spaan, Matthijs T. J.

In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) 2021

Abs Bib HTML PDF Supp Code Details

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where an independent signal models the safety aspects. In this setting, an RL agent can autonomously find tradeoffs between performance and safety. Unfortunately, most RL agents designed for CMDPs only guarantee safety after the learning phase, which might prevent their direct deployment. In this work, we investigate settings where a concise abstract model of the safety aspects is given, a reasonable assumption since a thorough understanding of safety-related matters is a prerequisite for deploying RL in typical applications. Factored CMDPs provide such compact models when a small subset of features describe the dynamics relevant for the safety constraints. We propose an RL algorithm that uses this abstract model to learn policies for CMDPs safely, that is without violating the constraints. During the training process, this algorithm can seamlessly switch from a conservative policy to a greedy policy without violating the safety constraints. We prove that this algorithm is safe under the given assumptions. Empirically, we show that even if safety and reward signals are contradictory, this algorithm always operates safely and, when they are aligned, this approach also improves the agent’s performance.
@inproceedings{Simao2021alwayssafe, author = {Sim{\~a}o, Thiago D. and Jansen, Nils and Spaan, Matthijs T. J.}, title = {AlwaysSafe: Reinforcement Learning Without Safety Constraint Violations During Training}, year = {2021}, booktitle = {Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)}, publisher = {IFAAMAS}, location = {Online}, pages = {1226--1235} }
AAAI
Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Simão, Thiago D., and Spaan, Matthijs T. J.

In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence 2019

Abs Bib HTML PDF Details

We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. It can achieve a better sample complexity by exploiting independence between features of the environment, but it lacks a confidence level. We study how to improve the sample efficiency of the safe policy improvement with baseline bootstrapping algorithm by exploiting the factored structure of the environment. Our main result is a theoretical bound that is linear in the number of parameters of the factored representation instead of the number of states. The empirical analysis shows that our method can improve the policy using a number of samples potentially one order of magnitude smaller than the flat algorithm.
@inproceedings{Simao2019safe, author = {Sim{\~a}o, Thiago D. and Spaan, Matthijs T. J.}, title = {{Safe Policy Improvement with Baseline Bootstrapping in Factored Environments}}, booktitle = {Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence}, pages = {4967--4974}, publisher = {{AAAI} Press}, year = {2019} }