Conference Paper/Proceeding/Abstract 741 views 298 downloads
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation
2025 7th International Conference on Natural Language Processing (ICNLP), Pages: 178 - 182
Swansea University Authors:
Deshan Sumanathilaka , Nicholas Micallef
, Julian Hough
-
PDF | Accepted Manuscript
Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (373.38KB)
DOI (Published version): 10.1109/icnlp65360.2025.11108362
Abstract
With the advent of Large Language Models (LLMs), Natural Language (NL) related tasks have been evaluated and explored. While the impact of temperature on text generation in LLMs has been explored, its influence on classification tasks remains unexamined despite temperature being a key parameter for...
| Published in: | 2025 7th International Conference on Natural Language Processing (ICNLP) |
|---|---|
| ISBN: | 979-8-3315-2188-2 979-8-3315-2187-5 |
| Published: |
Guangzhou, China
IEEE
2025
|
| URI: | https://cronfa.swan.ac.uk/Record/cronfa68938 |
| first_indexed |
2025-02-21T11:57:39Z |
|---|---|
| last_indexed |
2025-09-17T05:45:36Z |
| id |
cronfa68938 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-09-16T14:30:44.5735467</datestamp><bib-version>v2</bib-version><id>68938</id><entry>2025-02-21</entry><title>Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation</title><swanseaauthors><author><sid>2fe44f0c1e7d845dc21bb6b00d5b2085</sid><ORCID>0009-0005-8933-6559</ORCID><firstname>Deshan</firstname><surname>Sumanathilaka</surname><name>Deshan Sumanathilaka</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>1cc4c84582d665b7ee08fb16f5454671</sid><ORCID>0000-0002-2683-8042</ORCID><firstname>Nicholas</firstname><surname>Micallef</surname><name>Nicholas Micallef</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>082d773ae261d2bbf49434dd2608ab40</sid><ORCID>0000-0002-4345-6759</ORCID><firstname>Julian</firstname><surname>Hough</surname><name>Julian Hough</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-02-21</date><deptcode>MACS</deptcode><abstract>With the advent of Large Language Models (LLMs), Natural Language (NL) related tasks have been evaluated and explored. While the impact of temperature on text generation in LLMs has been explored, its influence on classification tasks remains unexamined despite temperature being a key parameter for controlling response randomness and creativity. In this study, we investigated the effect of the model's temperature on sense classification tasks for Word Sense Disambiguation (WSD). A carefully crafted Few-shot Chain of Thought (COT) prompt was used to conduct the study, and FEWS lexical knowledge was shared for the gloss identification task. GPT-3.5 and 4, LlaMa-3−70B and 3.1-70B, and Mixtral 8×22B have been used as the base models for the study, while evaluations are conducted with 0.2 intervals between the 0 to 1 range. The results demonstrate that temperature significantly affects the performance of LLMs in classification tasks, emphasizing the importance of conducting a preliminary study to select the optimal temperature for a task. The results show that GPT-3.5-Turbo and Llama-3.1-70B models have a clear performance shift, the Mixtral 8×22B model with minor deviations, while GPT-4-Turbo and LlaMa-3-70B models produce consistent results at different temperatures.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>2025 7th International Conference on Natural Language Processing (ICNLP)</journal><volume/><journalNumber/><paginationStart>178</paginationStart><paginationEnd>182</paginationEnd><publisher>IEEE</publisher><placeOfPublication>Guangzhou, China</placeOfPublication><isbnPrint>979-8-3315-2188-2</isbnPrint><isbnElectronic>979-8-3315-2187-5</isbnElectronic><issnPrint/><issnElectronic/><keywords>Temperature sensors, Large language models, Natural language processing, Temperature control, Creativity</keywords><publishedDay>19</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-08-19</publishedDate><doi>10.1109/icnlp65360.2025.11108362</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders>Swansea University</funders><projectreference/><lastEdited>2025-09-16T14:30:44.5735467</lastEdited><Created>2025-02-21T11:55:00.4418232</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Deshan</firstname><surname>Sumanathilaka</surname><orcid>0009-0005-8933-6559</orcid><order>1</order></author><author><firstname>Nicholas</firstname><surname>Micallef</surname><orcid>0000-0002-2683-8042</orcid><order>2</order></author><author><firstname>Julian</firstname><surname>Hough</surname><orcid>0000-0002-4345-6759</orcid><order>3</order></author></authors><documents><document><filename>68938__33663__bbe902bf027a409db6e1ecca253404df.pdf</filename><originalFilename>CT 0118 - CameraReady.pdf</originalFilename><uploaded>2025-02-21T11:57:20.6409536</uploaded><type>Output</type><contentLength>382337</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-09-16T14:30:44.5735467 v2 68938 2025-02-21 Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation 2fe44f0c1e7d845dc21bb6b00d5b2085 0009-0005-8933-6559 Deshan Sumanathilaka Deshan Sumanathilaka true false 1cc4c84582d665b7ee08fb16f5454671 0000-0002-2683-8042 Nicholas Micallef Nicholas Micallef true false 082d773ae261d2bbf49434dd2608ab40 0000-0002-4345-6759 Julian Hough Julian Hough true false 2025-02-21 MACS With the advent of Large Language Models (LLMs), Natural Language (NL) related tasks have been evaluated and explored. While the impact of temperature on text generation in LLMs has been explored, its influence on classification tasks remains unexamined despite temperature being a key parameter for controlling response randomness and creativity. In this study, we investigated the effect of the model's temperature on sense classification tasks for Word Sense Disambiguation (WSD). A carefully crafted Few-shot Chain of Thought (COT) prompt was used to conduct the study, and FEWS lexical knowledge was shared for the gloss identification task. GPT-3.5 and 4, LlaMa-3−70B and 3.1-70B, and Mixtral 8×22B have been used as the base models for the study, while evaluations are conducted with 0.2 intervals between the 0 to 1 range. The results demonstrate that temperature significantly affects the performance of LLMs in classification tasks, emphasizing the importance of conducting a preliminary study to select the optimal temperature for a task. The results show that GPT-3.5-Turbo and Llama-3.1-70B models have a clear performance shift, the Mixtral 8×22B model with minor deviations, while GPT-4-Turbo and LlaMa-3-70B models produce consistent results at different temperatures. Conference Paper/Proceeding/Abstract 2025 7th International Conference on Natural Language Processing (ICNLP) 178 182 IEEE Guangzhou, China 979-8-3315-2188-2 979-8-3315-2187-5 Temperature sensors, Large language models, Natural language processing, Temperature control, Creativity 19 8 2025 2025-08-19 10.1109/icnlp65360.2025.11108362 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required Swansea University 2025-09-16T14:30:44.5735467 2025-02-21T11:55:00.4418232 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Deshan Sumanathilaka 0009-0005-8933-6559 1 Nicholas Micallef 0000-0002-2683-8042 2 Julian Hough 0000-0002-4345-6759 3 68938__33663__bbe902bf027a409db6e1ecca253404df.pdf CT 0118 - CameraReady.pdf 2025-02-21T11:57:20.6409536 Output 382337 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en |
| title |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation |
| spellingShingle |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation Deshan Sumanathilaka Nicholas Micallef Julian Hough |
| title_short |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation |
| title_full |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation |
| title_fullStr |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation |
| title_full_unstemmed |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation |
| title_sort |
Exploring the Impact of Temperature on Large Language Models: A Case Study for Classification Task Based on Word Sense Disambiguation |
| author_id_str_mv |
2fe44f0c1e7d845dc21bb6b00d5b2085 1cc4c84582d665b7ee08fb16f5454671 082d773ae261d2bbf49434dd2608ab40 |
| author_id_fullname_str_mv |
2fe44f0c1e7d845dc21bb6b00d5b2085_***_Deshan Sumanathilaka 1cc4c84582d665b7ee08fb16f5454671_***_Nicholas Micallef 082d773ae261d2bbf49434dd2608ab40_***_Julian Hough |
| author |
Deshan Sumanathilaka Nicholas Micallef Julian Hough |
| author2 |
Deshan Sumanathilaka Nicholas Micallef Julian Hough |
| format |
Conference Paper/Proceeding/Abstract |
| container_title |
2025 7th International Conference on Natural Language Processing (ICNLP) |
| container_start_page |
178 |
| publishDate |
2025 |
| institution |
Swansea University |
| isbn |
979-8-3315-2188-2 979-8-3315-2187-5 |
| doi_str_mv |
10.1109/icnlp65360.2025.11108362 |
| publisher |
IEEE |
| college_str |
Faculty of Science and Engineering |
| hierarchytype |
|
| hierarchy_top_id |
facultyofscienceandengineering |
| hierarchy_top_title |
Faculty of Science and Engineering |
| hierarchy_parent_id |
facultyofscienceandengineering |
| hierarchy_parent_title |
Faculty of Science and Engineering |
| department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
With the advent of Large Language Models (LLMs), Natural Language (NL) related tasks have been evaluated and explored. While the impact of temperature on text generation in LLMs has been explored, its influence on classification tasks remains unexamined despite temperature being a key parameter for controlling response randomness and creativity. In this study, we investigated the effect of the model's temperature on sense classification tasks for Word Sense Disambiguation (WSD). A carefully crafted Few-shot Chain of Thought (COT) prompt was used to conduct the study, and FEWS lexical knowledge was shared for the gloss identification task. GPT-3.5 and 4, LlaMa-3−70B and 3.1-70B, and Mixtral 8×22B have been used as the base models for the study, while evaluations are conducted with 0.2 intervals between the 0 to 1 range. The results demonstrate that temperature significantly affects the performance of LLMs in classification tasks, emphasizing the importance of conducting a preliminary study to select the optimal temperature for a task. The results show that GPT-3.5-Turbo and Llama-3.1-70B models have a clear performance shift, the Mixtral 8×22B model with minor deviations, while GPT-4-Turbo and LlaMa-3-70B models produce consistent results at different temperatures. |
| published_date |
2025-08-19T05:26:51Z |
| _version_ |
1851097781242953728 |
| score |
11.089386 |

