Safe reinforcement learning

We have two projects related to safe reinforcement learning:

Multi-modal reinforcement learning: the goal is to improve upon the class of max-entropy RL algorithms by improving the expressivity of the policy. In particular we are studying policies modeled as parameterized samplers (SVGD, Diffusion models). For this family of policies we were able to derive closed form expressions of the entropy which we can directly incorporate in the loss function.
Safe offline RL: we are interested in improving the safety of offline RL algorithms in low data coverage applications. Our approach consists of (1) extracting constraints from the episodes with negative outcomes and (2) incorporating these constraints in the process of searching for an optimal policy under uncertainty. This project is motivated by finding optimal treatment strategies for sepsis in ICU.
Damaged object detectors: I am collaborating with Dr. Ferda, Dr. Imran and Dr. Rizwan Sadiq on building detectors for damaged objects. We are exploring two directions: (1) synthetic data generation from video games, and (2) zero-shot detection by leveraging CLIP embeddings. For the second direction, my immediate task is to run a state of the art zero-shot object segmentation and detection on a benchmark of damaged object scenes and check if out of the box CLIP embedding needs further fine-tuning. If this is the case, next, I will look into finding a dataset with aligned/weakly aligned text and image pairs.