DOWNLOAD RESUME

Comparison of Kernel Methods and Large Language Models in Sentiment Analysis

This summary describes the work carried out during a roughly 300-hour internship by Carlo Rosso, a student at the University of Padua. The project had multiple objectives.

After studying the problem of sentiment analysis and reviewing the related scientific literature, I implemented, experimented with, and compared at least three state-of-the-art models using the Sentiment Penn Treebank corpus. I also documented the implemented models and uploaded them to a git repository to ensure the reproducibility of the results.

More specifically, during the internship, I investigated the role of syntactic analysis and its compositional properties in understanding semantic phenomena, with a particular focus on sentiment analysis.


Problem

Sentiment analysis

classifier


Dataset

Stanford Sentiment Treebank

labeled-tree


Dataset

grammar-tree


Dataset

label-distribution


RNN

ModelloAccuratezza (%)
RNN43.2
RNN con 25 unità nascoste39.7
RNN con 50 unità nascoste42.4
RNN con 75 unità nascoste38.8
RNN con 100 unità nascoste39.8

Kernel Method

DatasetAccuratezza (%)
Subtree51
Merged53
Sentiment54
Syntax39

Kernel Method

ModelloAccuratezza(%)
Subset Tree Sentiment non normalized55.2
Subset Tree-bow Sentiment non normalized55.2
Partial Tree Sentiment non normalized55.2

Large Language Model

ModelloAccuratezza(%)
Bert52.3
Bert (implementato da me)53.2
RoBERTa56.4
RoBERTa (implementato da me)57.3
DistilBert52.0

Comparison

ModelloAccuratezza(%)
RNN42.4
Kernel Method55.2
RoBERTa57.3

Objectives

Achieved Objectives:

Desirable Objective Not Achieved:


Future Work

Models combinations:

models-combinations


Acquired Skills