All Research Projects

A comprehensive overview of my research work and contributions

Post-Training of Large Models with Preference Optimization

Algorithm Intern, Nanbeige Lab Jul 2024 -- Apr 2025
  • Designed a reward model evaluation framework that assesses alignment quality based on the performance of RM-selected responses on downstream benchmarks, enabling practical and model-specific RM selection. Built upon this, developed an efficient online DPO pipeline integrating RM-guided preference mining, GPT response mixing, and fine-grained iterative training, achieving faster convergence and substantial gains on alignment benchmarks. Blog
  • Proposed a selective alignment strategy for post-training large language models, prioritizing high-impact tokens within preference pairs based on token-level log-probability differences. Paper
  • Constructed and cleaned a high-quality 85k-scale STEM multiple-choice dataset for scientific reasoning, incorporating rule-based filtering, reflection-based evaluation, and repetition penalty to enhance LLM scientific reasoning performance under Zero-RL training. Dataset

Multi-Normal Prototypes Learning for Weakly Supervised Anomaly Detection

Research Assistant, School of Software and Microelectronics, Peking University Sep 2023 -- Jul 2024
  • Proposed a novel framework combining reconstruction learning with multi-normal prototype learning for detecting anomalies with limited labeled data.
  • Introduced a dynamic sample weighting strategy to estimate the likelihood of unlabeled samples being normal, mitigating contamination.
  • Designed a unified anomaly scoring module that integrates reconstruction error, latent features, and prototype similarity.
  • Achieved state-of-the-art performance on 15 benchmark datasets (AUC-PR/AUC-ROC), and robust generalization to unseen anomaly types.

Bridging the Information Gap Between Domain-Specific Models and General LLMs for Personalized Recommendation

Research Assistant, School of Software and Microelectronics, Peking University Sep 2023 -- Apr 2024
  • Proposed LLMRec, a hybrid recommendation framework bridging domain-specific models and general-purpose LLMs.
  • Designed a prompt-based user intent encoder that transforms behavior sequences into semantically meaningful inputs for LLMs.
  • Implemented a dual-encoder structure with supervised contrastive learning to align user representations from LLM and sequential models.
  • Improved recommendation accuracy in cold-start and interest-drifting scenarios, achieving SOTA on Amazon and MIND datasets.

The Research on Stock Price Trend Prediction Based on Multimodal Data

Undergraduate Thesis Researcher, School of Information Science and Technology, Peking University Jan 2023 -- Jun 2023
  • Proposed a multimodal attention fusion model based on deep learning, capable of processing inputs from multiple modalities and effectively integrating features to accurately predict future stock price trends.

Object Detection with YOLO Models

Algorithm Intern, Beijing Cike Qidong Technology Co., Ltd. Jul 2022 -- Nov 2022
  • Conducted research on object detection using YOLO models, improving accuracy via data augmentation, loss design, and model architecture adjustments.

Human Pose Estimation with Self-Attention

Research Intern, AI Innovation Center, Peking University Oct 2021 -- Jun 2022
  • Participated in research on human pose estimation, using self-attention mechanisms to address self-occlusion.

Embodied Intelligence for Visual-audio Navigation

Summer Research Intern, Frontier Computing Center, Peking University Jun 2021 -- Oct 2021
  • Conducted research on sound source localization using deep reinforcement learning, combining visual and auditory information.
  • Designed explicit fusion methods to extract sound direction probability distributions, improving task success rate.
Back to Home