Maitreya Patel

Ph.D. Student, School of Computing & AI, Arizona State University.

profile_photo.jpg

I am a senior Ph.D. student at Arizona State University (ASU). I am working alongside Yezhou Yang and Chitta Baral. I closely collaborate with Tejas Gokhale and Changhoon Kim.

My research focuses on the theoretical foundations of visual generative models and their applications in conditional sampling, including image/video editing, inverse problems, and personalization. I am also interested in representation learning, large-scale multimodal foundational models, and inference-time steering to enhance the controllability and reliability of generative models. I believe true World Models must be generalizable, efficient, controllable, responsible, and grounded in physical laws.

🚀 🚀 Alongside my research, I am writing The Stochastic Journey — a blog series that delves into the mathematical foundations of generative models, tracing their roots in stochastic calculus, probability theory, and differential equations.

I have extensive experience in developing large-scale text-to-image diffusion/flow and unified multimodal models, working across the full development lifecycle including mathematical foundations, model architecture design, pre-training and RL alignment. I focus on advancing both the fundamental understanding and building end-to-end systems that push the boundaries of what’s possible with diffusion and multimodal models.

Note: I am currently not taking on any new students for supervision. However, if you have a well-defined research proposal and can clearly articulate how we might collaborate, I welcome you to reach out.

News

Jul 25, 2025 🚀🚀 FlowChef and RefEdit are accepted at ICCV 2025! We’ll also host a tutorial. See you at Hawaii!
Jun 5, 2025 📝 Released RefEdit - a referring expression based image editing framework. Check out our paper and page! ✨
May 19, 2025 🎨 Joined Adobe Firefly team as Research Intern; exploring some cool stuff in generative models. 🚀
Mar 11, 2025 🖼️ Joined SonyAI (Vision Foundation Model and Generative AI) as Research Intern; working on multimodal generative models. :sparkles:
Jan 22, 2025 Voilà has been accepted at ICLR’25. :fire:

Selected Publications

  1. RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model for Referring Expression
    Bimsara Pathiraja *, Maitreya Patel *, Shivam Singh, Yezhou Yang, and Chitta Baral

    In ICCV 2025

  2. Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
    Maitreya Patel , Song Wen, Dimitris N. Metaxas, and Yezhou Yang

    In ICCV 2025

  3. Voilà: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
    Nilay Yilmaz,  Maitreya Patel , Yiran Luo, Tejas GokhaleChitta Baral, Suren Jayasuriya, and 1 more author

    In ICLR (Main Conference) – 2025

  4. TripletCLIP: Improving Compositional Reasoning of CLIP via Vision-Language Negatives
    Maitreya Patel , Abhiram Kusumba, Sheng Cheng, Changhoon KimChitta Baral, and Yezhou Yang

    In NeurIPS (Main Conference) – 2024

  5. λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
    Maitreya Patel , Sangmin Jung, Chitta Baral, and Yezhou Yang
    Media Coverages:  AK   , MarkTechPost

    In Transactions on Machine Learning Research (TMLR) 2024

  6. Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
    Sheng Cheng,  Maitreya Patel , and Yezhou Yang

    In EMNLP (findings) – 2024

  7. ECLIPSE:A Resource-Efficient Text-to-Image Prior for Image Generations
    Maitreya Patel Changhoon Kim, Sheng Cheng, Chitta Baral, and Yezhou Yang

    In CVPR – 2024

  8. WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
    Changhoon Kim*Kyle Min* Maitreya Patel , Sheng Cheng, and Yezhou Yang
    Media Coverages:  AK  

    In CVPR – 2024

  9. ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
    Maitreya Patel Tejas GokhaleChitta Baral, and Yezhou Yang

    In AAAI’24 | Diffusion Workshop at NeurIPS – 2023

  10. CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
    Maitreya Patel Tejas GokhaleChitta Baral, and Yezhou Yang

    In EMNLP, Main Conference – 2022

  11. Benchmarking generalization via in-context instructions on 1,600+ language tasks
    Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, and  others

    In EMNLP, Main Conference – 2022

  12. MSpeC-Net: Multi-Domain Speech Conversion Network
    Harshit Malaviya, Jui Shah,  Maitreya Patel , Jalansh Munshi, and Hemant A Patil

    In 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020

  13. CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion
    Maitreya Patel , Mirali Purohit, Jui Shah, and Hemant A Patil

    In 28th European Signal Processing Conference (EUSIPCO) 2020

  14. Weak Speech Supervision: A case study of Dysarthria Severity Classification
    Mirali Purohit, Mihir Parmar Maitreya Patel , Harshit Malaviya, and Hemant A Patil

    In 28th European Signal Processing Conference (EUSIPCO) 2020

  15. Novel adaptive generative adversarial network for voice conversion
    Maitreya Patel Mihir Parmar, Savan Doshi, Nirmesh J Shah, and Hemant A Patil

    In 11th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2019

  16. Effectiveness of cross-domain architectures for whisper-to-normal speech conversion
    Mihir Parmar, Savan Doshi, Nirmesh J Shah,  Maitreya Patel , and Hemant A Patil

    In 27th European Signal Processing Conference (EUSIPCO) 2019