Comparative Performance Analysis of Text Summarization: A Case Study of Extractive (TF-IDF, TextRank) and Abstractive (LLM) Methods
DOI:
https://doi.org/10.46808/iitp.v3i1.95Keywords:
Comparative Analysis, Abstractive Method, Extractive Method, Text Summarization, Natural Language ProcessingAbstract
This study presents a comparative performance analysis of two major paradigms in text summarization. The extractive paradigm, which operates by selecting significant sentences directly from the source text, is implemented through two approaches: (1) the statistical TF-IDF algorithm, which quantitatively scores sentences based on accumulated word significance weights; and (2) the graph-based TextRank algorithm, which represents sentences as nodes and determines their importance through centrality analysis within a semantic network. Representing the abstractive paradigm, the Large Language Model (LLM) Gemini is employed, which comprehends contextual information holistically to generate entirely new and coherent summary sentences. A qualitative comparative analysis of the outputs from these three methods reveals a fundamental trade-off. The abstractive method (Gemini) demonstrates superior performance in terms of narrative quality, producing summaries that are highly coherent, fluent, and natural-sounding, resembling human writing. Conversely, the extractive methods (TF-IDF and TextRank) inherently excel in ensuring perfect factual consistency, as there is no risk of misinterpretation or hallucinated information. Among the extractive methods, analysis indicates that TextRank tends to produce more structured and readable summaries compared to TF-IDF, owing to its ability to consider inter-sentence relationships. This study concludes that the choice of summarization method should be aligned with the specific priorities of the use case: abstractive methods are better suited for readability-focused tasks, whereas extractive methods are preferable for applications demanding absolute factual reliability.