I'm Amirhossein Kazemnejad, a researcher at Mila working on RL algorithms tailored to LLMs.
In one of my latest projects, we introduced VinePPO, which fixed the credit assignment problem that silently undermines RL tuning of LLMs. Developing fundamental RL algorithms for LLMs is one of my main areas of interest, and I will continue working on it.
Previously, during my grad studies at McGill University & Mila, I spent a lot of time studying positional encoding and the generalization behavior of Transformers. I was fortunate to be supervised by Siva Reddy during this time. The result of our work, NoPE, is now being used in recent LLM architectures like Llama 4 and Cohere Command A.
In the past, I also worked on knowledge acquisition in LLMs and retrieval-augmented generation, which I explored during my undergraduate studies at IUST back in 2019.
I've also worked on the open-source RL-for-LLM library: nano-aha-moment, along with its full YouTube lecture series. Feel free to check them out. Here you can find some of my (old) blog posts.