HIEA: Hierarchical Inference for Entity Alignment with Collaboration of Instruction-Tuned Large Language Models and Small Models

HIEA is a hierarchical inference framework for entity alignment that combines instruction-tuned LLMs with embedding-based small models to achieve superior alignment performance with a single LLM invocation, avoiding the costly multi-round interactions of prior LLM-enhanced approaches.

Problem Statement

Entity alignment across knowledge graphs is hindered by long-tail entities with sparse information that embedding-based methods cannot handle well. Existing LLM-enhanced approaches rely on in-context learning with multi-round interactions and elaborate prompt engineering, causing substantial computational overhead. Furthermore, current methods fail to exploit the complementary strengths of small embedding models and LLMs in a synergistic way.

Key Novelty

Instruction-tuning a generative LLM with a unified, concise prompt and a knowledge adapter to produce alignment results in a single forward pass, eliminating multi-round LLM interactions
Certainty-aware source entity classification that uses small model confidence scores to route entities to the appropriate inference tier (small model vs. LLM), enabling efficient hierarchical inference
Deep collaboration between embedding-based small models and the LLM, where small models contribute candidate generation, data augmentation, and classification support rather than acting as isolated components

Evaluation Highlights

Achieves absolute Hits@1 improvements of up to 5.6% over existing embedding-based and LLM-enhanced methods on both standard and highly heterogeneous EA benchmarks
Significantly reduces inference cost compared to LLM-enhanced baselines by replacing multi-round LLM interactions with a single invocation

Signal Assessment

6/10 HIEA presents a solid and practical contribution by combining instruction-tuning with hierarchical routing to address real efficiency and accuracy gaps in entity alignment; however, the core ideas of LLM+small model collaboration and candidate filtering are not entirely novel, making this a meaningful incremental advance rather than a paradigm shift.

Methodology

Embedding-based small models generate ranked candidate entities for each source entity and produce confidence scores used for certainty-aware classification, splitting entities into 'certain' (handled by small model alone) and 'uncertain' (forwarded to LLM) subsets
For uncertain entities, the small model performs data augmentation by enriching entity descriptions with neighborhood and relational context, which is fed into the LLM via a knowledge adapter alongside a unified instruction-tuning prompt
The instruction-tuned generative LLM processes uncertain entities in a single invocation to produce final alignment decisions, with results merged with the small model's certain predictions to form the complete alignment output

System Components

Embedding-Based Small Model

Generates dense entity embeddings, produces candidate entity rankings, and outputs confidence scores for each alignment prediction

Certainty-Aware Source Entity Classifier

Uses small model confidence scores to partition source entities into high-certainty cases (resolved by small model) and low-certainty cases (escalated to LLM), reducing unnecessary LLM calls

Knowledge Adapter

Transforms augmented entity context (from KG neighborhood and relational facts) into a format compatible with the LLM's input, bridging the structural KG representation and language model

Instruction-Tuned Generative LLM

Fine-tuned with a unified and concise prompt template to perform entity alignment in a single inference pass over uncertain entity candidates provided by the small model

Data Augmentation Module

Enriches entity representations with relational neighborhood information from the KG to address sparse information for long-tail entities before LLM inference

Results

Metric/Benchmark	Best Baseline	HIEA	Delta
Hits@1 (standard benchmarks)	Prior best LLM-enhanced method	Up to +5.6% absolute	+5.6% max
Hits@1 (heterogeneous benchmarks)	Prior best method	Consistently outperforms	Positive across all
Inference Cost	Multi-round LLM interactions	Single LLM invocation	Significantly reduced

Key Takeaways

Instruction-tuning an LLM for entity alignment with a single unified prompt is far more efficient than in-context learning with multi-round interactions, and practitioners should prefer fine-tuning over prompt chaining when latency and cost matter
Hierarchical routing—letting a small model handle high-confidence cases and escalating only uncertain ones to the LLM—is an effective general strategy for reducing LLM inference costs in structured prediction tasks without sacrificing accuracy
Small embedding models and LLMs are most powerful when tightly integrated: small models should actively support LLMs through candidate generation, confidence estimation, and data augmentation rather than being used as simple pre-filters

Abstract

Entity alignment (EA) facilitates knowledge fusion by matching semantically identical entities in distinct knowledge graphs (KGs). Existing embedding-based methods rely solely on intrinsic KG facts and often struggle with long-tail entities due to insufficient information. Recently, large language models (LLMs), empowered by rich background knowledge and strong reasoning abilities, have shown promise for EA. However, most current LLM-enhanced approaches follow the in-context learning paradigm, requiring multi-round interactions with carefully designed prompts to perform additional auxiliary operations, which leads to substantial computational overhead. Moreover, they fail to fully exploit the complementary strengths of embedding-based small models and LLMs. To address these limitations, we propose HIEA, a novel hierarchical inference framework for entity alignment. By instruction-tuning a generative LLM with a unified and concise prompt and a knowledge adapter, HIEA produces alignment results with a single LLM invocation. Meanwhile, embedding-based small models not only generate candidate entities but also support the LLM through data augmentation and certainty-aware source entity classification, fostering deeper collaboration between small models and LLMs. Extensive experiments on both standard and highly heterogeneous benchmarks demonstrate that HIEA consistently outperforms existing embedding-based and LLM-enhanced methods, achieving absolute Hits@1 improvements of up to 5.6%, while significantly reducing inference cost.