Abstract
Objective. To compare scientific search behavior with and without ChatGPT assistance.
Design/Method. Comparative study with two randomly assigned groups of graduate students; five tasks based on cognitive levels.
Tools/Settings. ChatGPT for Google extension version 3.5 and web search; data collected via screen recording.
Metrics. Number of searches, query length, SERP clicks, SERP browsing time, number/time of URL visits, total retrieval time.
Key Findings. Reduced clicks, reduced number/time of URL visits, and reduced total retrieval time with ChatGPT; increased query length and SERP browsing time; significant differences mainly in T1, T2, and T5.
Conclusion. ChatGPT excels in recall, understanding, and creation tasks; limited effect on analysis and evaluation.
Extended Abstract
Introduction
In the current era, with the rapid expansion of artificial intelligence technologies, tools like ChatGPT are quickly changing patterns of human interaction with information. Understanding how these tools affect scientific information retrieval and academic user behavior is increasingly important. The article "ChatGPT-Assisted Information Retrieval: A Comparative Study of User Behavior in Academic Information Retrieval" (Liu et al., 2024), published in the proceedings of the 24th ACM/IEEE joint conference on digital libraries (JCDL’24), investigates fundamental differences in user behavior when searching for scientific information with ChatGPT assistance compared to traditional methods. The article is innovative in its topic and study design; however, if a larger sample had been included, its findings would have been more reliable for decision-making.1
This study, utilized a structured approach and designed five tasks corresponding to the cognitive levels of “recall, understanding, analysis, evaluation, and creation” and compared the performance of two user groups (with and without ChatGPT). The findings indicate that ChatGPT can enhance search efficiency in some of these tasks.
Commentary
The study randomly assigned 20 first-year graduate students from various disciplines to two groups. The experimental group completed five search tasks designed according to the cognitive levels “recall, understanding, analysis, evaluation, and creation” using the ChatGPT for Google extension (version 3.5), while the control group performed the same tasks using a standard search engine. Behavioral data were collected via screen recording and manually annotated. Metrics included the number of searches, average query length, SERP (search engine results page) clicks, SERP browsing time, number of URLs selected, time spent on URLs, and total information retrieval time (seconds). The findings indicate that in all tasks, using ChatGPT reduced SERP clicks, the number and duration of URL visits, and total retrieval time. In contrast, the average query length and SERP browsing time increased with ChatGPT. Statistically significant differences were mainly observed in tasks T1 (recall), T2 (understanding), and T5 (creation). In the conclusion section, the authors highlighted ChatGPT’s advantage in fact/concept-oriented tasks (recall and understanding) and content creation tasks.
This study provides valuable insights into the application of ChatGPT in optimizing information retrieval processes, highlighting its role in enhancing search efficiency and its integration with traditional search engines, which represents an important step toward understanding how AI tools can be incorporated into information-seeking practices. Nonetheless, details regarding participant selection were not provided, which may introduce potential bias in the results.2
The research tasks were designed based on the cognitive levels framework, enhancing conceptual coherence and allowing assessment of participant performance across different cognitive domains. However, the lack of baseline assessment of participants’ information retrieval skills prior to the study may have influenced the results and reduced the accuracy of comparisons.
The collection of actual user behavioral data, including click counts, time spent browsing web pages, and the number of URLs visited, strengthened the validity and reliability of the findings. Nonetheless, environmental and individual factors such as internet connectivity, participants’ familiarity with the subject, or fatigue could have affected the outcomes and were not accounted for in the study.
Direct comparison of traditional search and ChatGPT-assisted search enabled objective and quantitative evaluation of the advantages and limitations of each approach. However, the study only included first-year master's students, and although it was mentioned that they were selected from various disciplines, the specific fields of study were not reported; this issue may limit the generalizability of the findings. The use of ChatGPT version 3.5 was clearly specified, which is justified given the study’s timeline, providing an accurate picture of the tool’s capabilities at that time. However, the fast development of newer ChatGPT versions means that the results only show the tool’s status at the time of the study, and comparing with newer versions may be needed in future research.3
Overall, the structured task design, use of real behavioral data, and the simultaneous analysis of traditional and ChatGPT-assisted search are key strengths of the study. However, methodological limitations and sample selection issues identify areas for future improvements to increase validity and generalizability. Nevertheless, the study provides valuable insights into the potential of ChatGPT-assisted information retrieval and offers important guidance for future research.
Conclusion
The study by Liu et al. (2024) represents a valuable step toward examining the role and impacts of ChatGPT in scientific information retrieval. However, for future research, it is essential that researchers pay attention to certain limitations that may arise in the design, execution, and reporting of studies in order to provide more replicable and efficient results. Accordingly, it is recommended that future studies expand the sample size and include more diverse participant groups to examine the impact of ChatGPT across different levels of expertise and experience. Comparative evaluations of various versions of ChatGPT, especially newer ones, can also help identify the strengths and limitations of each version and offer practical guidance for optimal tool selection. Assessing participants’ initial search skills prior to the study and conducting qualitative evaluations of outputs by domain experts can enhance the accuracy and validity of the findings.2 At the same time, considering intervening and contextual variables such as individual experience, type of query, familiarity with the subject, and level of information literacy, along with extending the scope of research applications to diverse domains such as medicine, law, education, and business, will provide a more comprehensive picture of ChatGPT’s effectiveness. Designing integrated and interconnected tasks, formulating clear hypotheses and assumptions, and using transparent evaluation criteria accompanied by reporting inter-rater reliability can further strengthen the theoretical and methodological framework of future studies and yield more reliable and practical results. However, future researchers need to consider some types of limitations that may occur in designing, executing, and reporting research in order to present more replicable and efficient results.4,5