Maitreya Patel

Ph.D. Student, School of Computing & AI, Arizona State University.

profile_photo.jpg

I am a Ph.D. candidate at Arizona State University (ASU). I am working alongside Yezhou Yang and Chitta Baral.

My research focuses on the theoretical foundations of visual generative models and their applications in conditional sampling, including image/video editing, inverse problems, and personalization. I am also interested in representation learning, large-scale multimodal foundational models, and inference-time steering to enhance the controllability and reliability of generative models. I believe true World Models must be generalizable, efficient, controllable, responsible, and grounded in physical laws.

🚀 🚀 Alongside my research, I am writing The Stochastic Journey — a blog series that delves into the mathematical foundations of generative models, tracing their roots in stochastic calculus, probability theory, and differential equations.

I have extensive experience in developing large-scale text-to-image diffusion/flow and unified multimodal models, working across the full development lifecycle including mathematical foundations, model architecture design, pre-training and RL alignment. I focus on advancing both the fundamental understanding and building end-to-end systems that push the boundaries of what’s possible with diffusion and multimodal models.

Note: I am currently not taking on any new students for supervision. However, if you have a well-defined research proposal and can clearly articulate how we might collaborate, I welcome you to reach out.

News

Sep 18, 2025 EraseFlow accepted at NeurIPS’25 as Spotlight. :fire: :fire:
Jul 25, 2025 🚀🚀 FlowChef and RefEdit are accepted at ICCV 2025! We’ll also host a tutorial. See you at Hawaii!
Jun 5, 2025 📝 Released RefEdit - a referring expression based image editing framework. Check out our paper and page! ✨
May 19, 2025 🎨 Joined Adobe Firefly team as Research Intern; exploring some cool stuff in generative models. 🚀
Mar 11, 2025 🖼️ Joined SonyAI (Vision Foundation Model and Generative AI) as Research Intern; working on multimodal generative models. :sparkles:

Selected Publications

  1. EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
    Abhiram Kusumba *, Maitreya Patel *, Kyle MinChanghoon KimChitta Baral, and Yezhou Yang

    In NeurIPS (Spotlight) 2025

  2. RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model for Referring Expression
    Bimsara Pathiraja *, Maitreya Patel *, Shivam Singh, Yezhou Yang, and Chitta Baral

    In ICCV 2025

  3. Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
    Maitreya Patel , Song Wen, Dimitris N. Metaxas, and Yezhou Yang

    In ICCV 2025

  4. Voilà: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
    Nilay Yilmaz,  Maitreya Patel , Yiran Luo, Tejas GokhaleChitta Baral, Suren Jayasuriya, and 1 more author

    In ICLR (Main Conference) – 2025

  5. TripletCLIP: Improving Compositional Reasoning of CLIP via Vision-Language Negatives
    Maitreya Patel , Abhiram Kusumba, Sheng Cheng, Changhoon KimChitta Baral, and Yezhou Yang

    In NeurIPS (Main Conference) – 2024

  6. λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
    Maitreya Patel , Sangmin Jung, Chitta Baral, and Yezhou Yang
    Media Coverages:  AK   , MarkTechPost

    In Transactions on Machine Learning Research (TMLR) 2024

  7. Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
    Sheng Cheng,  Maitreya Patel , and Yezhou Yang

    In EMNLP (findings) – 2024

  8. ECLIPSE:A Resource-Efficient Text-to-Image Prior for Image Generations
    Maitreya Patel Changhoon Kim, Sheng Cheng, Chitta Baral, and Yezhou Yang

    In CVPR – 2024

  9. WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
    Changhoon Kim*Kyle Min* Maitreya Patel , Sheng Cheng, and Yezhou Yang
    Media Coverages:  AK  

    In CVPR – 2024

  10. ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
    Maitreya Patel Tejas GokhaleChitta Baral, and Yezhou Yang

    In AAAI’24 | Diffusion Workshop at NeurIPS – 2023

  11. CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
    Maitreya Patel Tejas GokhaleChitta Baral, and Yezhou Yang

    In EMNLP, Main Conference – 2022

  12. Benchmarking generalization via in-context instructions on 1,600+ language tasks
    Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, and  others

    In EMNLP, Main Conference – 2022