Greetings! I’m currently a PhD student at the University of Science and Technology of China. And I serve as a research intern at WeChat Vision, Tencent Inc.

My research interest includes:

vision language models
semantic segmentation
parameter-efficient fine-tuning

I anticipate graduating in 2026 for industrial research positions. If you’re interested, please feel free to reach out to me via email. or WeChat (wzx-vi).

w1oves

🔥 News

2025.06: 🔥 Delighted to announce that HQCLIP: Leveraging Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models were accepted by ICCV 2025!
2024.09: 🔥 Delighted to announce that Masked Pre-trained Model Enables Universal Zero-shot Denoiser were accepted by NeurIPS 2024!
2024.02: 🔥 Rein is accepted in CVPR 2024! [Project Page]
2024.01: 🔥 Rein achieves SOTA in Cityscapes $\rightarrow$ ACDC test set generalization!
2023.10: 🔥 Rein is released and achieves SOTA in domain generalized semantic segmentation!
2023.07: 🎉 DTP is accepted in ICCV 2023 and achieves SOTA in night-time and full-time semantic segmentation!
2022.10: Our DDB receives the Spotlight Award in NeurIPS 2022!
2022.09: DDB is accepted in NeurIPS 2022 and achieves SOTA with ResNet counterparts on the single-source, multi-source, and multi-target domain-adaptive semantic segmentation tasks!
2022.03: A discriminator-free adversarial domain adaptation framework DALN is accepted in CVPR 2022!

📝 Publications

(* denotes equal contribution.)

ICCV 2025

First Author

HQCLIP: Leveraging Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

Zhixiang Wei*, Guangting Wang*, Xiaoxiao Ma, et al.

[Project page]

We generated detailed, bidirectional long-text descriptions for 1.3 billion images and pretrained/fine-tuned CLIP based on this dataset. Building upon this foundation, we propose a novel CLIP training framework that combines both bidirectional supervision and label classification losses. This framework achieves SoTA results on zero-shot classification, retrieval, and other tasks at the same data scale.

CVPR 2024

First Author

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Zhixiang Wei*, Lin Chen*, Yi Jin*, Xiaoxiao Ma, et al.

[Project page]

We propose the Reins framework, which efficiently fine-tunes vision foundation models for the domain generalized semantic segmentation (DGSS) task with just 1% trainable parameters, surprisingly surpassing full parameter fine-tuning. And Reins builds a new SOTA in various DGSS benchmarks.

ICCV 2023

First Author

Disentangle then Parse: Night-time Semantic Segmentation with Illumination Disentanglement
Zhixiang Wei*, Lin Chen*, et al.

We propose a novel nigh-time semantic segmentation paradigm, i.e., disentangle then parse (DTP), which explicitly disentangles night-time images into light-invariant reflectance and light-specific illumination components and then recognizes semantics based on their adaptive fusion.

NeurIPS 2022 (Spotlight)

Co-First Author

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Lin Chen*, Zhixiang Wei*, Xin Jin*, et al.

We leverage the complementary characteristics of the coarse-wise and fine-wise data mixing techniques to progressively transfer the knowledge from the source to the target domain.

NeurIPS 2024