Selected Publications

You can also find my full publication list on my Google Scholar profile.

Journal Articles

PH2ST: Prompt-Guided Hypergraph Learning for Spatial Transcriptomics Prediction in Whole Slide Images

Spatial transcriptomics provides valuable molecular maps, but current assays remain costly, sparse, and difficult to scale across large tissue regions. PH2ST uses limited spatial transcriptomics signals as prompts to guide multi-scale histological representation learning with a hypergraph framework for robust gene expression prediction from H&E slides. Across public datasets and realistic prompt settings, it outperforms prior methods and supports applications such as missing-spot imputation, super-resolution, and local-to-global prediction.

Niu Y, Liu J, Zhan Y, Shi J, Zhang D, Reinius M, Machado I, Crispin-Ortuzar M, Wu J, Li C, Gao Z*. PH2ST: Prompt-Guided Hypergraph Learning for Spatial Transcriptomics Prediction in Whole Slide Images[J]. Medical Image Analysis, 2026: 104008.

MegaSeg: Towards Scalable Semantic Segmentation for Megapixel Images

Megapixel image segmentation is central to high-resolution histopathology analysis, but standard pipelines depend on patching or downsampling that lose important context. MegaSeg introduces an end-to-end segmentation framework for megapixel images that combines streaming convolutional networks, a U-shaped architecture, and a divide-and-conquer strategy. It preserves fine detail and global structure while dramatically reducing memory requirements for very large images.

Kaura SK, Wu J, Gao Z*, Li C*. MegaSeg: Towards Scalable Semantic Segmentation for Megapixel Images. Medical Image Analysis. 2026 Jan 10:103933.

SMMILe enables accurate spatial quantification in digital pathology using multiple-instance learning

Spatial quantification is essential in computational pathology, yet many multiple-instance learning methods gain slide-level accuracy at the cost of spatial awareness. SMMILe shows that instance-level aggregation can achieve strong spatial quantification without sacrificing whole-slide prediction and introduces a superpatch-based measurable MIL formulation. Across multiple cancer types, tasks, and datasets, it consistently improves spatial localization and slide-level performance.

Gao Z., Mao, A., Dong, Y. et al. SMMILe enables accurate spatial quantification in digital pathology using multiple-instance learning. Nat Cancer (2025).

ProGIS: Prototype-Guided Interactive Segmentation for Pathological Images

Interactive segmentation is attractive for computational pathology because it can reduce annotation cost while still accommodating pathologist input. ProGIS introduces a prototype-guided interactive segmentation framework that segments pathological structures accurately with minimal interaction and can recover multiple same-type connected components from one prompt. This makes slide-level interactive pathology segmentation more efficient and practical than fully automatic or heavily interactive alternatives.

Ge J, Zhang D, Zhan Y, Liu J, Gong T, Wu J, Crispin M, Li C, Gao Z*. ProGIS: Prototype-Guided Interactive Segmentation for Pathological Images. IEEE Transactions on Medical Imaging. 2025.

StaDis: Stability distance to detecting out-of-distribution data in computational pathology

Computational pathology models can fail silently when they encounter out-of-distribution data that differ from the training distribution. StaDis introduces a plug-and-play OOD detection method tailored to this setting by measuring the feature gap between an image and its perturbed counterpart. Without retraining the underlying predictor, it improves deployment safety and helps flag unreliable predictions in real clinical environments.

Zhang D, Ge J, Liu J, Wang C, Gong T, Gao Z*, Li C*. StaDis: Stability distance to detecting out-of-distribution data in computational pathology. Medical Image Analysis. 2025 Aug 27:103774.

A fully annotated pathology slide dataset for early gastric cancer and precancerous lesions

Early gastric cancer diagnosis from ESD specimens is clinically important but remains labor-intensive and prone to interobserver variability. This work releases a fully annotated pathology slide dataset designed for precise examination of early gastric cancer and precancerous lesions. The dataset provides a challenging benchmark for computational pathology and supports the development of AI systems for fine-grained lesion detection and analysis.

Wang, C., Ge, J., Niu, Y., Ding, C., Fan, Y., Chang, H., Yang, Z., Ran, C., Teng, X., Wang, X., Wu, L., Gao, Z.*, Li, C.* (2025). A fully annotated pathology slide dataset for early gastric cancer and precancerous lesions. Scientific Data, 12.

CoxKAN: Kolmogorov-Arnold networks for interpretable, High-Performance survival analysis

Survival analysis in medicine requires models that are both accurate and interpretable, yet deep survival models are often treated as black boxes. CoxKAN introduces a Cox proportional hazards Kolmogorov-Arnold Network that combines strong predictive performance with transparent functional structure. Evaluations on synthetic and real-world datasets show that it offers a practical balance between interpretability and high-performance survival modeling.

W Knottenbelt, W McGough, R Wray, W Zhang, J Liu, I Machado, Z Gao*, M Crispin*, CoxKAN: Kolmogorov-Arnold networks for interpretable, High-Performance survival analysis, Bioinformatics, 2025

ALPaCA: Adapting Llama for Pathology Context Analysis to enable slide-level question answering

Large vision-language models are promising for computational pathology, but existing systems are largely restricted to small predefined regions rather than gigapixel whole-slide images. ALPaCA introduces a general-purpose slide-level LVLM trained on tens of thousands of WSIs with curated descriptions and question-answer pairs, combining a slide-level adaptor with prototype-based modeling and Llama3.1. It achieves strong slide-level question answering performance and can be adapted efficiently to organ-specific or disease-specific pathology tasks.

Gao Z, He K, Su W, et al. ALPaCA: Adapting Llama for Pathology Context Analysis to enable slide-level question answering[J]. medRxiv, 2025: 2025.04. 22.25326190.

From patches to WSIs: A systematic review of deep Multiple Instance Learning in computational pathology

Computational pathology systems based on whole-slide images are often bottlenecked by the need for costly fine-grained annotations. This review surveys how multiple instance learning reduces that dependence by learning from coarse supervision while aggregating information from large-scale WSIs. It synthesizes recent advances, organizes the rapidly growing literature, and highlights the technical trends shaping modern MIL research in pathology.

Zhang Y1, Gao Z1, He K, et al. From patches to WSIs: A systematic review of deep Multiple Instance Learning in computational pathology[J]. Information Fusion, 2025: 103027.

Multiple serous cavity effusion screening based on smear images using vision transformer

Serous cavity effusion smears are widely used in cytological diagnosis, but manual examination can be labor-intensive and variable in accuracy. This study builds a vision transformer-based framework for identifying malignant cells from smear images collected from 161 patients and thousands of annotated patches. The model improves automated screening performance and offers a more precise computational tool for assisting cytological assessment in routine clinical workflows.

Wang C, Wang X, Gao Z, Ran C, Li C, Ding C. Multiple serous cavity effusion screening based on smear images using vision transformer[J]. Scientific Reports, 2024, 14(1): 7395.

MG-trans: Multi-scale graph transformer with information bottleneck for whole slide image classification

Existing MIL pipelines for whole-slide image classification often rely on many high-magnification patches, creating redundant inputs while underusing spatial structure. MG-Trans addresses this by combining patch anchoring, dynamic structure learning, and a multi-scale information bottleneck within a graph-transformer framework. The resulting model captures fine-grained morphology more efficiently and strengthens discriminative whole-slide representations.

Shi J, Tang L, Gao Z, Li Y, Wang C, Gong T, Li C, Fu H. MG-trans: Multi-scale graph transformer with information bottleneck for whole slide image classification[J]. IEEE Transactions on Medical Imaging, 2023, 42(12): 3871-3883.

A structure-aware hierarchical graph-based multiple instance learning framework for pt staging in histopathological image

Pathological primary tumor staging depends on contextual evidence across multiple magnifications, but dense annotation on gigapixel whole-slide images is impractical. This work introduces a structure-aware hierarchical graph-based MIL framework that progressively models cross-scale contextual information instead of treating patches independently. The method improves weakly supervised pT staging by capturing multiscale structural cues that are critical for prognosis-related classification.

Shi J, Tang L, Li Y, Zhang X, Gao Z, Zheng Y, Wang C, Gong T, Li C. A structure-aware hierarchical graph-based multiple instance learning framework for pt staging in histopathological image[J]. IEEE Transactions on Medical Imaging, 2023, 42(10): 3000-3011.

Childhood leukemia classification via information bottleneck enhanced hierarchical multi-instance learning

Bone marrow smear analysis for childhood leukemia is labor-intensive and traditionally depends on detailed expert cell annotations. This work formulates the problem with patient-level supervision and introduces a hierarchical multi-instance learning framework enhanced by an information bottleneck. The model captures subtype relationships across multiple hierarchies and improves leukemia classification with better data efficiency and generalization.

Gao Z, Mao A, Wu K, et al. Childhood leukemia classification via information bottleneck enhanced hierarchical multi-instance learning[J]. IEEE Transactions on Medical Imaging, 2023, 42(8): 2348-2359.

A semi-supervised multi-task learning framework for cancer classification with weak annotation in whole-slide images

Cancer region detection and subtype classification are two key tasks in digital pathology, but both are constrained by limited precise annotations on whole-slide images. This work proposes a semi-supervised multi-task framework that jointly learns detection and subtyping instead of training them as isolated steps. By coupling the two tasks under weak supervision, it reduces annotation demand and improves slide-level cancer classification.

Gao Z, Hong B, Li Y, et al. A semi-supervised multi-task learning framework for cancer classification with weak annotation in whole-slide images[J]. Medical Image Analysis, 2023, 83: 102652.

Unsupervised representation learning for tissue segmentation in histopathological images: From global to local contrast

Tissue segmentation requires pixel-level labels that are expensive to obtain in histopathology. This paper develops an unsupervised representation learning framework that moves from global to local contrastive objectives so that the learned features become useful for fine-grained tissue discrimination. By encoding multi-granularity views without annotations, it improves segmentation quality under limited-label conditions.

Gao Z, Jia C, Li Y, et al. Unsupervised representation learning for tissue segmentation in histopathological images: From global to local contrast[J]. IEEE Transactions on Medical Imaging, 2022, 41(12): 3611-3623.

Conference Papers

Thinking in Scales: Accelerating Gigapixel Pathology Image Analysis via Adaptive Continuous Reasoning

Traditional whole-slide image analysis relies on exhaustive patch-level processing that is computationally expensive at gigapixel scale. PathCTM formulates diagnosis as adaptive scale-space continuous reasoning, progressively moving from low-magnification global inspection to high-magnification local evidence gathering with dynamic scale switching, region pruning, and confidence-aware early stopping. It cuts required image patches and inference time by about 96% while maintaining slide-level AUC.

Ge J, Zhan Y, Zhao W, Zhang D, Wang K, Liu J, Yang C, Li C, Zhang J, Dong Y, Zhang N, Liu Q, Crispin-Ortuzar M, Fu H, Li C, Gao Z. Thinking in Scales: Accelerating Gigapixel Pathology Image Analysis via Adaptive Continuous Reasoning. Accepted to the International Conference on Machine Learning (ICML), 2026.

CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis

Existing pathology foundation models often inherit natural-image backbones that overlook the heterogeneous and irregular organization of pathological regions of interest. CARE introduces a two-stage pretraining strategy that first learns morphological structure from large-scale whole-slide images and then aligns adaptive region representations with molecular signals from RNA and protein profiles. Using only a fraction of the pretraining data common in prior work, CARE delivers strong average performance across 33 downstream benchmarks for classification, molecular prediction, and survival analysis.

D Zhang, Z Gong, X Pang, J Liu, J Lu, H Cui, J Ge, Z Zeng, K Yi, Y Li, S Liu, T Yu, H Wang, M Crispin-Ortuzar, W Yu, C Li, Z Gao*. CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis. Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026.

HAAF: Hierarchical Adaptation and Alignment of Foundation Models for Few-Shot Pathology Anomaly Detection

Few-shot pathology anomaly detection depends on subtle region-level abnormalities, yet generic vision-language adaptation often fails because semantic prompts are not grounded in fine-grained visual evidence. HAAF tackles this granularity mismatch with a hierarchical adaptation and alignment strategy centered on cross-level scaled alignment, where visual context first refines text prompts and the adapted prompts then guide anomaly-focused visual encoding. A dual-branch inference design further improves stability, and experiments on four benchmarks show strong gains over existing few-shot baselines.

Yang C, Zhao W, Tang Y, Lu J, Ge J, Liu Q, Gao Z*, Li C. HAAF: Hierarchical Adaptation and Alignment of Foundation Models for Few-Shot Pathology Anomaly Detection. Accepted to The Web Conference (WWW), 2026.

Learning Heterogeneous Embedding with Prototype-Aware Graph Attention for Whole Slide Image Classification

Whole-slide images contain diagnostic cues spanning local neighborhoods, distant regions, and hierarchical tissue organization, but existing graph and MIL models do not unify these relations effectively. This paper proposes a prototype-aware heterogeneous graph attention network that lets each region interact with diverse heterogeneous neighbors while guiding slide-level representation learning with multilevel prototypes. The framework strengthens whole-slide classification by jointly modeling local, non-local, and hierarchical structure within a single representation space.

Niu Y, Liu J, Zhan Y, Shi J, Chen J, Zhang D, Li C, Gao Z*. Learning Heterogeneous Embedding with Prototype-Aware Graph Attention for Whole Slide Image Classification[C]. 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2025: 2671-2678.

Shallow-Deep Synergy: Boosting Cross-Domain Generalization in Histopathological Image Segmentation

Histopathological image segmentation suffers from severe domain shifts caused by staining variation, imaging conditions, and tissue diversity across sites and organs. Shallow-Deep Synergy improves generalization in U-Net-based segmentation by explicitly combining the complementary strengths of shallow fine-detail features and deep semantic features. This design strengthens dense prediction performance under cross-domain settings where standard domain generalization methods are less effective.

X Wang, W Su, Y Dong, Y Li, X Zhang, T Gong, IP Machado, M Crispin-Ortuzar, C Li, Z Gao*. Shallow-Deep Synergy: Boosting Cross-Domain Generalization in Histopathological Image Segmentation[C]//2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024: 3790-3794.

Pamil: Prototype attention-based multiple instance learning for whole slide image classification

Whole-slide images often contain heterogeneous tumor patterns, but many MIL methods still assume a single dominant label and provide limited interpretability. PAMIL introduces prototype attention-based multiple instance learning to model multiple histotypes within one slide while producing more meaningful explanations of the reasoning process. This makes whole-slide classification more clinically useful in settings where tumor heterogeneity matters.

J Liu, A Mao, Y Niu, X Zhang, T Gong, C Li, Z Gao*. Pamil: Prototype attention-based multiple instance learning for whole slide image classification[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024: 362-372.

Uncertainty-based Model Acceleration for Cancer Classification in Whole-Slide Images

Whole-slide image classification is often slowed by the need to process many high-magnification patches across a gigapixel slide. This paper proposes an uncertainty-based acceleration strategy that mimics pathologists by sending only suspicious high-uncertainty regions to expensive high-resolution analysis while handling most regions at low magnification. The framework reduces inference cost and deployment burden without sacrificing the accuracy needed for computational pathology applications.

Gao Z, Mao A, Wu J, et al. Uncertainty-based Model Acceleration for Cancer Classification in Whole-Slide Images[C]//2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2022: 1534-1538.

Instance-based vision transformer for subtyping of papillary renal cell carcinoma in histopathological image

Papillary renal cell carcinoma subtyping requires subtle cellular and cell-layer patterns that conventional CNNs struggle to capture in large histopathological images. This paper proposes an instance-based vision transformer that focuses on top informative nuclei-centered instances and models their relationships with positional and grade-aware embeddings. The design learns finer morphological representations and improves performance on fine-grained pRCC subtyping.

Gao Z, Hong B, Zhang X, et al. Instance-based vision transformer for subtyping of papillary renal cell carcinoma in histopathological image[C]//International conference on medical image computing and computer-assisted intervention. Cham: Springer International Publishing, 2021: 299-308.

Nuclei Grading of Clear Cell Renal Cell Carcinoma in Histopathological Image by Composite High-Resolution Network

Clear cell renal cell carcinoma grading depends on accurate nuclei segmentation and fine-grained nuclei classification, both of which are challenging in crowded pathological images. This work introduces a composite high-resolution network that first separates clustered nuclei and then performs cross-category grading-aware classification. The framework addresses inter-class similarity in nuclear appearance and improves automated ccRCC grading for pathology analysis.

Gao Z, Shi J, Zhang X, et al. Nuclei grading of clear cell renal cell carcinoma in histopathological image by composite high-resolution network[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2021: 132-142.

Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images

Automated renal cell carcinoma detection and subtyping is limited by the lack of large whole-slide datasets with precise annotations. This paper proposes a semi-supervised framework built on a minimal point-based annotation strategy, where annotators only mark a few cancerous and non-cancerous points in each slide. The resulting detector and subtype classifier achieve performance comparable to much more heavily annotated alternatives while substantially lowering labeling effort.

Gao Z, Puttapirat P, Shi J, et al. Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2020: 439-448.