Post-Training of Large Models with Preference Optimization
- Designed a reward model evaluation framework that assesses alignment quality based on the performance of RM-selected responses on downstream benchmarks, enabling practical and model-specific RM selection. Built upon this, developed an efficient online DPO pipeline integrating RM-guided preference mining, GPT response mixing, and fine-grained iterative training, achieving faster convergence and substantial gains on alignment benchmarks. Blog
- Proposed a selective alignment strategy for post-training large language models, prioritizing high-impact tokens within preference pairs based on token-level log-probability differences. Paper
- Constructed and cleaned a high-quality 85k-scale STEM multiple-choice dataset for scientific reasoning, incorporating rule-based filtering, reflection-based evaluation, and repetition penalty to enhance LLM scientific reasoning performance under Zero-RL training. Dataset
Multi-Normal Prototypes Learning for Weakly Supervised Anomaly Detection
- Proposed a novel framework combining reconstruction learning with multi-normal prototype learning for detecting anomalies with limited labeled data.
- Introduced a dynamic sample weighting strategy to estimate the likelihood of unlabeled samples being normal, mitigating contamination.
- Designed a unified anomaly scoring module that integrates reconstruction error, latent features, and prototype similarity.
- Achieved state-of-the-art performance on 15 benchmark datasets (AUC-PR/AUC-ROC), and robust generalization to unseen anomaly types.
Bridging the Information Gap Between Domain-Specific Models and General LLMs for Personalized Recommendation
- Proposed LLMRec, a hybrid recommendation framework bridging domain-specific models and general-purpose LLMs.
- Designed a prompt-based user intent encoder that transforms behavior sequences into semantically meaningful inputs for LLMs.
- Implemented a dual-encoder structure with supervised contrastive learning to align user representations from LLM and sequential models.
- Improved recommendation accuracy in cold-start and interest-drifting scenarios, achieving SOTA on Amazon and MIND datasets.
The Research on Stock Price Trend Prediction Based on Multimodal Data
- Proposed a multimodal attention fusion model based on deep learning, capable of processing inputs from multiple modalities and effectively integrating features to accurately predict future stock price trends.
Object Detection with YOLO Models
- Conducted research on object detection using YOLO models, improving accuracy via data augmentation, loss design, and model architecture adjustments.
Human Pose Estimation with Self-Attention
- Participated in research on human pose estimation, using self-attention mechanisms to address self-occlusion.
Embodied Intelligence for Visual-audio Navigation
- Conducted research on sound source localization using deep reinforcement learning, combining visual and auditory information.
- Designed explicit fusion methods to extract sound direction probability distributions, improving task success rate.