My Projects - Zhijin Dong

Post-Training of Large Models with Preference Optimization

Algorithm Intern, Nanbeige Lab Jul 2024 -- Apr 2025

Designed a reward model evaluation framework that assesses alignment quality based on the performance of RM-selected responses on downstream benchmarks, enabling practical and model-specific RM selection. Built upon this, developed an efficient online DPO pipeline integrating RM-guided preference mining, GPT response mixing, and fine-grained iterative training, achieving faster convergence and substantial gains on alignment benchmarks. Blog
Proposed a selective alignment strategy for post-training large language models, prioritizing high-impact tokens within preference pairs based on token-level log-probability differences. Paper
Constructed and cleaned a high-quality 85k-scale STEM multiple-choice dataset for scientific reasoning, incorporating rule-based filtering, reflection-based evaluation, and repetition penalty to enhance LLM scientific reasoning performance under Zero-RL training. Dataset

Research Assistant, School of Software and Microelectronics, Peking University Sep 2023 -- Jul 2024

Proposed a novel framework combining reconstruction learning with multi-normal prototype learning for detecting anomalies with limited labeled data.
Introduced a dynamic sample weighting strategy to estimate the likelihood of unlabeled samples being normal, mitigating contamination.
Designed a unified anomaly scoring module that integrates reconstruction error, latent features, and prototype similarity.
Achieved state-of-the-art performance on 15 benchmark datasets (AUC-PR/AUC-ROC), and robust generalization to unseen anomaly types.

Research Assistant, School of Software and Microelectronics, Peking University Sep 2023 -- Apr 2024

Proposed LLMRec, a hybrid recommendation framework bridging domain-specific models and general-purpose LLMs.
Designed a prompt-based user intent encoder that transforms behavior sequences into semantically meaningful inputs for LLMs.
Implemented a dual-encoder structure with supervised contrastive learning to align user representations from LLM and sequential models.
Improved recommendation accuracy in cold-start and interest-drifting scenarios, achieving SOTA on Amazon and MIND datasets.

Undergraduate Thesis Researcher, School of Information Science and Technology, Peking University Jan 2023 -- Jun 2023

Proposed a multimodal attention fusion model based on deep learning, capable of processing inputs from multiple modalities and effectively integrating features to accurately predict future stock price trends.

Algorithm Intern, Beijing Cike Qidong Technology Co., Ltd. Jul 2022 -- Nov 2022

Conducted research on object detection using YOLO models, improving accuracy via data augmentation, loss design, and model architecture adjustments.

Research Intern, AI Innovation Center, Peking University Oct 2021 -- Jun 2022

Participated in research on human pose estimation, using self-attention mechanisms to address self-occlusion.

Summer Research Intern, Frontier Computing Center, Peking University Jun 2021 -- Oct 2021

Conducted research on sound source localization using deep reinforcement learning, combining visual and auditory information.
Designed explicit fusion methods to extract sound direction probability distributions, improving task success rate.