IEEE ACCESS Paper Accepted.

Jan 15
2 min read

Journal: IEEE ACCESS, Vol. xx, pp. xx-xx, Jan. 20, 2026. (DOI: 10.1109/ACCESS.2026.3655413)

Title: FS-SegDiff: Mitigating Intra-class Variation and Attention Leakage for Robust Diffusion-based Few-shot Semantic Segmentation

Authors: Jaehyun Kim, Wonyong Seo, and Munchurl Kim

Abstract:

Few-shot semantic segmentation (FSS) has gained a significant attention as a promising solution to segmenting the target objects of unseen classes using only a few annotated examples. Recently, pretrained diffusion models have emerged as an effective approach for FSS, acting as both a powerful feature extractor and a mask generator which are the essential part of FSS models. However, their generative capabilities which excel at preserving intra-class variation such as texture, color, and scale, are often not appropriate for semantic segmentation, a task that demands feature invariance. Furthermore, fine-tuning these models for FSS exposes them to the task’s inherent problem of seen-class bias, leading to attention leakage and degraded performance on unseen classes. To address these challenges, we propose a diffusionbased FSS framework, denoted as FS-SegDiff, that integrates Bidirectional Feature Fusion (BDFF) modules with Query-guided key adaptation (QKA), Multi-scale Feature Matching (MSFM) strategy, and an attention leakage compensation via Support-key Amplifying Factor (SKAF). The BDFF modules with QKA enable a bidirectional fusion between support and query features, thus progressively improving common semantic alignment rather than fine details between the two features. The MSFM strategy generates multi-scale support pairs to ensure the optimal feature matching across varying object sizes between support and query images. Finally, SKAF compensates for the attention leakage observed in unseen classes by boosting the information of support images and masks during inference. Our FS-SegDiff achieved superior segmentation performance on the COCO-20i, LVIS-92i, and FSS-1000 datasets compared to existing state-of-the-art (SOTA) methods, even with a lot fewer training epochs.

Comments