Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis

Yang, Scott; Deravi, Farzin

doi:10.3390/app12189287

Journal article 893 views 116 downloads

Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis

Scott Yang

, Farzin Deravi

Applied Sciences, Volume: 12, Issue: 18, Start page: 9287

Swansea University Author: Scott Yang

PDF | Version of Record

Copyright: © 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Download (1.77MB)

Check full text

DOI (Published version): 10.3390/app12189287

Abstract

In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering fe...

Full description

Published in:	Applied Sciences
ISSN:	2076-3417
Published:	MDPI AG 2022
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa61289

first_indexed	2022-09-20T15:34:18Z
last_indexed	2023-01-13T19:21:58Z
id	cronfa61289
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2022-10-12T14:26:59.5698930</datestamp><bib-version>v2</bib-version><id>61289</id><entry>2022-09-20</entry><title>Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis</title><swanseaauthors><author><sid>81dc663ca0e68c60908d35b1d2ec3a9b</sid><ORCID>0000-0002-6618-7483</ORCID><firstname>Scott</firstname><surname>Yang</surname><name>Scott Yang</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-09-20</date><deptcode>MACS</deptcode><abstract>In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering feature vectors to match the data between the training set and query sample as proposed in this paper could be a promising way for boosting the classification performance in machine learning applications. The proposed mechanism is designed to re-engineer the feature components from a set of embedding vectors for greatly increased between-class separation, hence better leveraging the informative content of the documents. The proposed mechanism was evaluated using four public benchmarking datasets for both two-way and five-way semantic classifications. The resulting embeddings have demonstrated substantially improved performance for a range of sentiment analysis tasks. Tests using all the four datasets achieved by far the best classification results compared with the state-of-the-art.</abstract><type>Journal Article</type><journal>Applied Sciences</journal><volume>12</volume><journalNumber>18</journalNumber><paginationStart>9287</paginationStart><paginationEnd/><publisher>MDPI AG</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2076-3417</issnElectronic><keywords>sentiment analysis; semantic classification; feature re-engineering; NLP</keywords><publishedDay>16</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-09-16</publishedDate><doi>10.3390/app12189287</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>SU College/Department paid the OA fee</apcterm><funders>Swansea University</funders><projectreference/><lastEdited>2022-10-12T14:26:59.5698930</lastEdited><Created>2022-09-20T16:28:03.8981891</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Scott</firstname><surname>Yang</surname><orcid>0000-0002-6618-7483</orcid><order>1</order></author><author><firstname>Farzin</firstname><surname>Deravi</surname><order>2</order></author></authors><documents><document><filename>61289__25165__0dbe035591024cd1b167a0c610d0441d.pdf</filename><originalFilename>61289.VOR.pdf</originalFilename><uploaded>2022-09-20T16:32:20.4680138</uploaded><type>Output</type><contentLength>1861110</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: © 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2022-10-12T14:26:59.5698930 v2 61289 2022-09-20 Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis 81dc663ca0e68c60908d35b1d2ec3a9b 0000-0002-6618-7483 Scott Yang Scott Yang true false 2022-09-20 MACS In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering feature vectors to match the data between the training set and query sample as proposed in this paper could be a promising way for boosting the classification performance in machine learning applications. The proposed mechanism is designed to re-engineer the feature components from a set of embedding vectors for greatly increased between-class separation, hence better leveraging the informative content of the documents. The proposed mechanism was evaluated using four public benchmarking datasets for both two-way and five-way semantic classifications. The resulting embeddings have demonstrated substantially improved performance for a range of sentiment analysis tasks. Tests using all the four datasets achieved by far the best classification results compared with the state-of-the-art. Journal Article Applied Sciences 12 18 9287 MDPI AG 2076-3417 sentiment analysis; semantic classification; feature re-engineering; NLP 16 9 2022 2022-09-16 10.3390/app12189287 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University SU College/Department paid the OA fee Swansea University 2022-10-12T14:26:59.5698930 2022-09-20T16:28:03.8981891 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Scott Yang 0000-0002-6618-7483 1 Farzin Deravi 2 61289__25165__0dbe035591024cd1b167a0c610d0441d.pdf 61289.VOR.pdf 2022-09-20T16:32:20.4680138 Output 1861110 application/pdf Version of Record true Copyright: © 2022 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. true eng https://creativecommons.org/licenses/by/4.0/
title	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
spellingShingle	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis Scott Yang
title_short	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_full	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_fullStr	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_full_unstemmed	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
title_sort	Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis
author_id_str_mv	81dc663ca0e68c60908d35b1d2ec3a9b
author_id_fullname_str_mv	81dc663ca0e68c60908d35b1d2ec3a9b_***_Scott Yang
author	Scott Yang
author2	Scott Yang Farzin Deravi
format	Journal article
container_title	Applied Sciences
container_volume	12
container_issue	18
container_start_page	9287
publishDate	2022
institution	Swansea University
issn	2076-3417
doi_str_mv	10.3390/app12189287
publisher	MDPI AG
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing the feature vectors explicitly. Engineering feature vectors to match the data between the training set and query sample as proposed in this paper could be a promising way for boosting the classification performance in machine learning applications. The proposed mechanism is designed to re-engineer the feature components from a set of embedding vectors for greatly increased between-class separation, hence better leveraging the informative content of the documents. The proposed mechanism was evaluated using four public benchmarking datasets for both two-way and five-way semantic classifications. The resulting embeddings have demonstrated substantially improved performance for a range of sentiment analysis tasks. Tests using all the four datasets achieved by far the best classification results compared with the state-of-the-art.
published_date	2022-09-16T09:20:38Z
_version_	1830271412404748288
score	11.060726

Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis

Similar Items