No Cover Image

Conference Paper/Proceeding/Abstract 741 views 298 downloads

Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation

Deshan Sumanathilaka Orcid Logo, Nicholas Micallef Orcid Logo, Julian Hough Orcid Logo

2025 7th International Conference on Natural Language Processing (ICNLP), Pages: 178 - 182

Swansea University Authors: Deshan Sumanathilaka Orcid Logo, Nicholas Micallef Orcid Logo, Julian Hough Orcid Logo

  • CT 0118 - CameraReady.pdf

    PDF | Accepted Manuscript

    Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).

    Download (373.38KB)

DOI (Published version): 10.1109/icnlp65360.2025.11108362

Abstract

With the advent of Large Language Models (LLMs), Natural Language (NL) related tasks have been evaluated and explored. While the impact of temperature on text generation in LLMs has been explored, its influence on classification tasks remains unexamined despite temperature being a key parameter for...

Full description

Published in: 2025 7th International Conference on Natural Language Processing (ICNLP)
ISBN: 979-8-3315-2188-2 979-8-3315-2187-5
Published: Guangzhou, China IEEE 2025
URI: https://cronfa.swan.ac.uk/Record/cronfa68938
Abstract: With the advent of Large Language Models (LLMs), Natural Language (NL) related tasks have been evaluated and explored. While the impact of temperature on text generation in LLMs has been explored, its influence on classification tasks remains unexamined despite temperature being a key parameter for controlling response randomness and creativity. In this study, we investigated the effect of the model's temperature on sense classification tasks for Word Sense Disambiguation (WSD). A carefully crafted Few-shot Chain of Thought (COT) prompt was used to conduct the study, and FEWS lexical knowledge was shared for the gloss identification task. GPT-3.5 and 4, LlaMa-3−70B and 3.1-70B, and Mixtral 8×22B have been used as the base models for the study, while evaluations are conducted with 0.2 intervals between the 0 to 1 range. The results demonstrate that temperature significantly affects the performance of LLMs in classification tasks, emphasizing the importance of conducting a preliminary study to select the optimal temperature for a task. The results show that GPT-3.5-Turbo and Llama-3.1-70B models have a clear performance shift, the Mixtral 8×22B model with minor deviations, while GPT-4-Turbo and LlaMa-3-70B models produce consistent results at different temperatures.
Keywords: Temperature sensors, Large language models, Natural language processing, Temperature control, Creativity
College: Faculty of Science and Engineering
Funders: Swansea University
Start Page: 178
End Page: 182