Professor
School of Computer Science and Engineering
Sino-German Joint Software Institue (JSI)
Beihang University
I am a Professor in School of Computer Science and Engineering at Beihang University. I received B.S and Ph.D degrees under supervision of Prof. Depei Qian. I was also a Post-doc researcher in Department of Computer Science and Engineering at University of Michigan. My research interests include high performance computing, performance analysis and optimization, deep learning system and compilation, parallel and distributed computing. My recent research investigates a holistic approach of across-stack optimization for high performance, high scalability and high portability, with special interests in large scale elastic training system, deep learning compilation and auto-tuning techniques, sparse tensor optimization, exascale performance analysis tool and high performance linear algebra for emerging processors. I have authored over 80 scientific publications in the leading international journals and conferences. I received the Excellence Teaching Award from Beihang University in 2016.
I served as the committee member of CCF Doctoral Dissertation Incentive Program, Youth Editorial Board of the CCF Transactions on High Performance Computing (CCF THPC). I was the architecture area program co-chair of 23rd IEEE International Conference on Cluster Computing (CLUSTER), 2021. I currently serve as reviewers in the premier journals including TPDS、TC、PARCO、JPDC、FGCS、FCS. I am also the supervisor of Beihang Supercomputing Team, which has won the Silver Prize of ASC’17, Bronze Prize of ISC’17, Highest Linpack Award, Application Innovation Award and First Class Award of ASC competitions.
🔥 News
- 🔥 June 2025: Two papers (ESC and OVERT) are accepted to ICPP. Congratulations to Kelun Lei and Xuezhu Wang.
- 🔥 April 2025: One paper (STAD) is accepted to TPDS. Congratulations to Zhibo Xuan.
- 🔥 March 2025: Two papers (Plasticine and AOStencil) are accepted to ICS. Congratulations to Siqi Wang and Shanghao Liu.
- 🔥 February 2025: One paper (SimTrace) is accepted to TACO. Congratulations to Zhibo Xuan.
- 🔥 February 2025: One paper (GNNPerf) is accepted to IPDPS. Congratulations to Kejie Ma.
- 🔥 February 2025: One paper (LightLLM) is accepted to ASPLOS. Congratulations to Siyu Wu.
- 🔥 February 2025: One paper (DynVec) is accepted to TACO. Congratulations to Kelun Lei and Shaokang Du.
- 🔥 August 2024: One paper (RecServ) is accepted to TC. Congratulations to Xin You.
- 🔥 June 2024: Two papers (Moirae and GVARP) are accepted to SC. Congratulations to Xiaoyan Liu and Xin You.
- 🔥 June 2024: Two papers (PRoof and Jigsaw) are accepted to ICPP. Congratulations to Siyu Wu and Kaige Zhang.
- 🔥 March 2024: One paper (AtRec) is accepted to TPDS. Congratulations to Siqi Wang and Tianyu Feng.
- 🔥 March 2024: Our paper Tetris is selected as the best paper candidate (three papers in total) in PPoPP 2024. Congratulations to Xiaoyan Liu.
- 🔥 November 2023: One paper (Tetris) is accepted to PPoPP (best paper candidate). Congratulations to Xiaoyan Liu.
- 🔥 October 2023: One paper (GSTuner) is accepted to TPDS. Congratulations to Qingxiao Sun.
- 🔥 June 2023: Two papers (EasyScale and TrivialSpy) are accepted to SC. Congratulations to Mingzhen Li and Xin You.
- 🔥 June 2023: One paper (FamilySeer) is accepted to ICPP. Congratulations to Mingzhen Li.
- 🔥 May 2023: One TC paper is selected as IEEE Computer’s “Spotlight on Transactions”. Congratulations to Qingxiao Sun.
- 🔥 April 2023: One paper (BiRFIA) is accepted to ICS. Congratulations to Kelun Lei.
- 🔥 March 2023: One paper (swLego) is accepted to SCIS. Congratulations to Mingzhen Li.
📝 Selected Publications
- 🔥 OVERT: Orchestrating Vector-Scalar Execution for Efficient SpMV on Modern CPUs (ICPP) 2025.
- 🔥 ESC: Effective Submanifold Convolution using Tensor Cores (ICPP) 2025.
- 🔥 Identifying Performance Inefficiencies of Parallel Program with Spatial and Temporal Trace Analysis (TPDS) 2025.
- 🔥 Efficient Locality-aware Instruction Stream Scheduling for Stencil Computation on ARM Processors (ICS) 2025.
- 🔥 Accelerating Complex Stencil Computations with Adaptive Fusion Strategy (ICS) 2025.
- 🔥 SimTrace: Exploiting Spatial and Temporal Sampling for Large-Scale Performance Analysis (TACO) 2025.
- 🔥 GNNPerf: Towards Effective Performance Profiling and Analysis across GNN Frameworks (IPDPS) 2025.
- 🔥 Past-Future Scheduler for LLM Serving under SLA Guarantees (ASPLOS) 2025.
- 🔥 Exploiting Dynamic Regular Patterns in Irregular Programs for Efficient Vectorization (TACO) 2025.
- 🔥 Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving (TC) 2024.
- 🔥 GVARP: Detecting Performance Variance on Large-Scale Heterogeneous System (SC) 2024.
- 🔥 Moirae: Generating High-Performance Composite Stencil Programs with Global Optimizations (SC) 2024.
- 🔥 PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis (ICPP) 2024.
- 🔥 Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor Core (ICPP) 2024.
- 🔥 AtRec: Accelerating Recommendation Model Training on CPUs (TPDS) 2024.
- 🔥 Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU (PPoPP, best paper candidate) 2024.
- 🔥 Adaptive Auto-tuning Framework for Global Exploration of Stencil Optimization on GPUs (TPDS) 2023.
- 🔥 EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs (SC) 2023.
- 🔥 TrivialSpy: Identifying Software Triviality via Fine-grained and Dataflow-based Value Profiling (SC) 2023.
- 🔥 Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs (ICPP) 2023.
- 🔥 BiRFIA: Selective Binary Rewriting for Function Interception on ARM (ICS) 2023.
- 🔥 Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU (IPDPS) 2023.
- 🔥 VClinic: A Portable and Efficient Framework for Fine-grained Value Profilers (ASPLOS) 2023.
- 🔥 Building a Domain-Specific Compiler for Emerging Processors with a Reusable Approach (SCIS) 2023.
- 🔥 Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor (FCS) 2022.
- 🔥 CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs (SC) 2022.
- 🔥 Vectorizing SpMV by Exploiting Dynamic Regular Patterns (ICPP) 2022.
- 🔥 NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database (ICPP) 2022.
- 🔥 Toward accelerated stencil computation by adapting tensor core unit on GPU (ICS) 2022.
- 🔥 StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs (IPDPS) 2022.
- 🔥 PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling (IPDPS) 2022.
- Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP (TC) 2021.
- The Deep Learning Compiler: A Comprehensive Survey (TPDS) 2021.
- Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee (TOCS) 2021.
- SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning (SC) 2020.
- ZeroSpy: Exploring Software Inefficiency with Redundant Zeros (SC) 2020.
- SympleGraph: Distributed Graph Processing with Precise Loop-Carried Dependency Guarantee (PLDI) 2020.
- Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture (TPDS) 2020.
- Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer (TPDS) 2020.
- Temperature-Aware DRAM Cache Management - Relaxing Thermal Constraints in 3-D Systems (TCAD) 2020.
- Redundant Loads: A Software Inefficiency Indicator (ICSE) 2019.
- LWPTool: A Lightweight Profiler to Guide Data Layout Optimization (TPDS) 2018.
- SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs (TPDS) 2018.
- PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP (ISCA) 2017.
- Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers (ASPLOS) 2017.
- Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers (ASPLOS) 2016.
- Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers (ISCA) 2013.
🎖 Honors and Awards
- CCF HPC Talent Young Scientist Award, 2024.
- Best Paper Award Nomination, ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2024. (4 nominations out of 153 submissions)
- Beihang University May 4th Medal Nomination Award, 2023.
- CCF HPCChina Workshop Distinguished Speaker 2021.
- Beihang Distinguished Young Scholar Award 2021.
- Best Paper Award Nomination, IEEE International Conference on Cluster Computing (CLUSTER), 2021. (2 nominations out of 168 submissions)
- CCF CNCC Workshop Distinguished Speaker 2020.
- CCF HPCChina Workshop Distinguished Speaker 2020.
- Best Paper Award, BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench), 2020.
- Beihang University Excellence Teaching Award in 2016.
💬 Teaching
- Methodology of Computer Science Research (Undergraduate Student)
- Parallel Programming (International Student)
- Computer Achitecture (International Student)