E-Book, Englisch, 150 Seiten
Pilania / Kumar / Singh Advanced Information Retrieval System: Theoretical and Experimental Perspective
1. Auflage 2026
ISBN: 979-8-89881-366-6
Verlag: De Gruyter
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
E-Book, Englisch, 150 Seiten
ISBN: 979-8-89881-366-6
Verlag: De Gruyter
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Advanced Information Retrieval System: Theoretical and Experimental Perspective blends foundational theory with practicality to provide an integrative exploration of modern information retrieval (IR) systems. This volume examines a wide range of IR methodologies, from classical indexing and ranking techniques to cutting-edge AI-driven approaches, demonstrating how these systems can be applied across diverse domains, including web search, recommendation systems, sentiment analysis, and multimedia retrieval.
The book takes a structured approach towards guiding readers from traditional IR models to advanced, hybrid frameworks. The early chapters focus on classical and modern retrieval techniques with comparative analyses of different methods. Subsequent chapters focus on applied scenarios such as tourism recommender systems, sentiment mining from YouTube comments, book and medicine recommendation engines, and image-audio-based retrieval systems. Advanced topics include semantic role classification using BERT, hybrid filtering methods, personalised web crawlers, and experimental studies on smoothing techniques. Real-world case studies and experimental evaluations illustrate how theoretical models translate into effective, domain-specific IR applications.
Key Features
Comprehensive coverage of traditional, modern, and hybrid IR techniques
Practical frameworks for recommendation systems, sentiment analysis, and web crawling
Integration of AI and machine learning methods, including BERT and TF-IDF models
Experimental evaluations and comparative analyses across multiple domains
Real-world applications spanning tourism, healthcare, fashion, and multimedia retrieval
Autoren/Hrsg.
Weitere Infos & Material
Comparative Analysis of Different Information Retrieval Methods
Urmila Pilania, Manoj Kumar, Sanjay Singh
Abstract
Information Retrieval (IR) techniques are growing continuously from being keyword-based systems to advanced search. These days, IR techniques utilize Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP) for providing more accurate and personalized results. In the proposed research work, the IR techniques are analysed for their merits and demerits. In the work, it has been examined how contemporary research has been transformed into query document matching. This work integrates Term Frequency-Inverse Document Frequency (TF-IDF) into two retrieval metrics—cosine similarity and dot product similarity. Integration aims to provide better results. Cosine similarity is good at capturing vector orientation, while dot product similarity is good for vector magnitude. A combined similarity is weighted at parameter a to enhance the retrieval capacity. From the simulation of work, it has been calculated that the combined method performed well. In the future, authors will incorporate machine learning or deep learning methods to enhance the performance of these IR techniques.
INTRODUCTION
As digital information is growing day by day, IR techniques need to be more accurate so that the required information can be retrieved on time. To improve the IR system, the authors analyzed different IR techniques to find the merits and demerits of the existing methods. There is a significant improvement in IR techniques if we consider the growth from traditional techniques to modern techniques. Modern techniques can handle diverse data and retrieve accurate results on time [16]. Due to the exponential growth of digital data, the components of search range from educational content to social media, transport, e-commerce, healthcare, and many more.
The user experience is improved by maintaining scalability and confirming the relevance of the content [17]. Fig. (1) represents some measure functions that are
required to be performed before the process of actual search starts, such as understanding how to formulate the query using some special keywords like OR, AND, NOT, etc. [18]. First, we need to understand the classical methods and then apply the modern methods for information retrieval. The data needs to be stored in a structured way for efficient query retrieval. The text pre-processing includes tokenization, stop-word removal, stemming, and much more needs to be done. Users are also required to capture the semantic relationship in data. The dimensions of data are also required to be reduced so that hidden relationships can be captured on time. Fig. (2) shows the different components of the IR system.
Fig. (1))Prior work for IR methods [18]. Fig. (2))
Components of IR [19].
The paper is organized into a total of 5 sections. Section 1 is about the introduction of IR. The general prior steps, along with the components of IR, are explained. Section 2 discusses the literature review with the help of a literature summary table. Section 3 is about the proposed methodology, in which the proposed techniques are discussed in detail along with their merits and demerits. Section 4 presents the results and discussion, and graphs are used to explain in detail. Section 5 is the conclusion section, along with the future scope.
LITERATURE REVIEW
In paper [16], novel IR methods are used to employ generative models to link queries to related document identifiers. The work has been analyzed to enhance query generation excellence, examine learnable identifiers, and improve scalability, as well as integrate GR with multi-task learning frameworks. The author [17] proposed a model of integration of NLP and ML. It is based on a court case summary data. The proposed method automates citation retrieval by applying textual and cutting-edge embedding techniques. The proposed work was validated using the Supreme Court of the United States dataset, achieving an accuracy of 90.9%.
Fardin Akhlaghian et al. [20] investigated personalizing search engine results with autonomous fuzzy concept networks, which use ontology ideas to augment a common fuzzy network depending on user profiles. Experiments reveal that personalized search engine results outperform common fuzzy network notions. Javed A. Aslam et al. [21] presented a method for measuring retrieval system performance without making relevant judgments, and it shows that it coincides with actual assessments in the TREC competition. The researchers employ a measure to assess the similarity of retrieval systems and demonstrate that evaluating systems based on average similarity produces results comparable to Soboroff's methodology, demonstrating a preference for popularity over performance.
Patrick Lewis et al. [22] examined a general-purpose method for optimizing RAG, a language generation method that utilizes parametric and non-parametric memory that has already been taught. Task-specific architectures and parametric seq2seq models lose ground to RAG models, which produce more factual, diversified, and specific language than a state-of-the-art parametric-only seq2seq baseline. Pre-trained language models generate state-of-the-art results and retain their factual knowledge when used on subsequent NLP tasks. They perform worse than task-specific designs, however, because of their limited access to and manipulation of knowledge. Yahui Chen [23] focused on identifying related candidates in a query using a multi-label classification problem. Two CNNs were proposed: a parallel CNN and a deep CNN. Both models gather local semantic features and choose global features using a max-over-time pooling layer. Experiments demonstrated that these models outperform classic SVC-based techniques, with Deep CNN doing better because of its greater semantic learning ability.
Wasseem N. Ibrahem Al-Obaydy et al. [24] described a document classification strategy for categorizing research publications into expressive groups based on a common scientific area. The method classified documents using word tokens taken from themes relevant to a single group and the K-means clustering algorithm. The approach categorized papers based on their title, abstract, keywords, and category subjects. Experimental results suggested that this technique outperformed the k-nearest neighbors algorithm in terms of information retrieval accuracy. Akram Roshdi et al. [25] examined a variety of IR models and methodologies, including indexing algorithms and classical models. IR arose in the 1950s as a response to the requirement to archive and locate important information. Over the last 40 years, IR systems have expanded tremendously, and they are now an important study subject in computer science.
Mei Kobayashi et al. [26] examined research on the development of the Internet and information search technology. It has shown persistent tendencies of exponential development in the previous and upcoming decades, with 85% of consumers using search engines. However, users are disappointed with the performance of contemporary search engines, citing slow retrieval speed, communication delays, and poor result quality as major complaints. Aleksander Theo Strand et al. [27] discussed SoccerRAG, which is a methodology for extracting soccer-related information from multimodal datasets by combining RAG with Large Language Models. It enabled dynamic querying, automated data validation, and improved user interaction. The interactive user interface provided a chatbot-like visual experience. Shinnosuke Tanaka et al. [28] developed KnowledgeHub, a program that extracts information from scientific literature and answers questions by converting PDF documents into text and structured representations. It employed a browser-based annotation tool to annotate the information, train Named Entity Recognition and Relation Classification models, and create a knowledge graph. It also included Large Language Models for QA and summarisation, giving customers complete visibility into the knowledge discovery cycle.
Zhiwei Li et al. [29] examined existing methodologies, problems, prospective research prospects, and benchmarks in the FRS field to provide context and assistance for investigating this new topic. Federated Recommendation Systems (FRS) is a potential way to protect user privacy that combines federated learning with recommendation systems. However, FRS has limitations, such as data heterogeneity and paucity. Foundation Models (FM) were models that understand human intent and perform specified tasks, resulting in high-quality content in the image and text...




