About Me
I am currently a Remote Research Intern at Microsoft Research, collaborating with CSU-JPG Lab, Central South University, under the supervision of Linjie Li and Prof. Alex Jinpeng Wang. Previously, I was a Research Intern at Show Lab, National University of Singapore (NUS), supervised by Prof. Mike Shou. I received my M.Sc. from NUS and my B.Eng. from Nanjing University of Science and Technology.
My research primarily focuses on unified multimodal models, especially multimodal generation and multimodal large language models. I am also interested in extending these techniques and insights to world models and embodied AI.
๐๐ I am actively looking for PhD positions starting in Spring/Fall 2027.
News
- 2026.04: TextAtlas5M accepted by ICML 2026.
- 2026.02: Residual Decoder Adapter accepted by CVPR 2026.
- 2025.11: TextGround4M accepted by AAAI 2026.
- 2024.10: Started remote research internship at Microsoft Research, Seattle, collaborating with CSU-JPG Lab, Central South University.
Selected Publications
Check out the full publication list at my Google Scholar profile.
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
ICML 2026
Dongxing Maoโ , Alex Jinpeng Wangโ , Jiawei Zhang, Weiming Han, Zhuobai Dong, Linjie Li, Yiqi Lin, Zhengyuan Yang, Libo Qin, Fuwei Zhang, Lijuan Wang, Min Li.(โ equal contribution)
[Project Page][Datasets][arXiv]
[Github]
๐๐100K+ downloads on Hugging Face๐๐
Residual Decoder Adapter: ID-Preserving Tokenizer Adaption for Autoregressive Text Rendering
CVPR 2026
Dongxing Maoโ , Alex Jinpeng Wangโ , Jiahao Tang, Kevin Qinghong Lin, Linjie Li, Zhengyuan Yang, Lijuan Wang, Min Li, Jingru Tan.
[Project Page][arXiv]
[Github]
TextGround4M: A Prompt-Aligned Dataset for Layout-Aware Text Rendering
AAAI 2026
Dongxing Mao, Yilin Wang, Linjie Li, Zhengyuan Yang, Alex Jinpeng Wang.
[Project Page][Datasets][arXiv]
[Github]
VCode: A Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
CVPR 2026, Visual Concepts Workshop Oral
Kevin Qinghong Linโ , Yuhao Zhengโ , Hangyu Ranโ , Dongxing Mao, Linjie Li, Philip Torr, Alex Jinpeng Wang(โ equal contribution)
[Project Page][arXiv][Github]
- AssistGUI: Task-oriented Desktop Graphical User Interface Automation, Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou, CVPR 2024.
- VideoLLM-online: Towards Large Video Language Model for Streaming Video, Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, JiaWei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou, CVPR 2024.
- AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant, Stan Weixian Lei, Yuxuan Wang, Dongxing Mao, Difei Gao, Mike Zheng Shou, EMNLP 2022.
- AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant, Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou, ECCV 2022.
Experience
- Microsoft Research, Seattle.
Remote Research Intern, Oct 2024 - Present.
Collaborative research with JPG Lab, Central South University. Supervisor: Linjie Li.
Research on multimodal unified model, visual tokenizers.
- Show Lab, National University of Singapore, Singapore.
Research Intern, Aug 2021 - Sep 2024.
Supervisor: Mike Shou Zheng.
Research on video understanding, video-language models.
Education
- 2021 - 2023: M.Sc. Electrical and Computer Engineering. National University of Singapore.
- 2017 - 2021: B.Eng. Communication Engineering. Nanjing University of Science and Technology.
Academic Service
-
Conference Reviewer: ECCV, AAAI, ICLR, CVPR.
-
Organizer: CVPR 2023 Workshop LOVEU.



