Selected Projects in Data Science


How Consensus can be Involved Through Innovating Voting Mechanism? Using Polis Platform as an Example

Paper | Code | Slides

UMAP and PCA

Used PCA and UMAP to visualize the participants’ stance on a 2-dimensional map.

Applied KMeans clustering to classify groups A and B and calculated centroid coordinates to measure the distance between groups.


Network Analysis of Twitter to Identify Opinion Leaders, Emotional Cascades, and Community Structures

Paper | Code | Slides

Emotion Cascade

Deployed K-core decomposition to examine the community structure.

Applied NLP techniques, including Named Entity Recognition, Sentiment Analysis, and Topic Modeling, on tweets to investigate emotional cascades.


Forecasting Cryptocurrency Prices Using Machine Learning: An Analysis of Reddit Discussions

Code | Website

LSTM Analysis

By leveraging techniques like Exploratory Data Analysis (EDA), Natural Language Processing (NLP), and Machine Learning (ML) on a big data scale, we aim to uncover hidden patterns and trends that can shed light on the realities of investing in the crypto markets.

Developed data pipelines for a dataset of 6 million rows in AWS, leveraging SparkNLP for sentiment analysis and time-series prediction using LSTM, ARIMA, and Synapse ML for causal inference to model the sentiment trends and prices relationship.


Quantifying the Complex Relationship between Lyrics, Chord Progression, and Emotion Stimulation

Paper | Code | Website | Blog

Word Cloud

Utilized Bag of Words and Word2Vec methods, combined with KNN/hierarchical clustering, to identify songs with similar lyrics.

Incorporated LDA and sentiment analysis for deeper lyrical understanding. Explored relationships between numerical (tempo, chord progression, key) and textual features using supervised machine learning.

Trained models to understand the connection between elements in music, improving recommendation accuracy using machine learning techniques and textual features.


Building a Song Similarity Calculator to Improve the Recommendation System

Whitepaper | Code | Application

Dashboard

Features an interface for a song similarity calculator that evaluates both lyrical and numerical attributes of songs.

Utilizes cosine similarity for comparison, displaying results through an overlapping radar graph for visual analysis.


Predicting Attitudes Towards UBI in the EU Using Supervised Machine Learning Techniques

Paper | Code | Slides

UBI Analysis

Built and trained Logistic Regression, Decision Tree, SVM, Random Forest, XGBoost, and GBDT models.

Identified five primary and 35 secondary indicators influencing EU citizens’ attitudes towards UBI.