Selected Projects in Data Science
How Consensus can be Involved Through Innovating Voting Mechanism? Using Polis Platform as an Example
Paper | Code | Slides

Used PCA and UMAP to visualize the participants’ stance on a 2-dimensional map.
Applied KMeans clustering to classify groups A and B and calculated centroid coordinates to measure the distance between groups.
Network Analysis of Twitter to Identify Opinion Leaders, Emotional Cascades, and Community Structures
Paper | Code | Slides

Deployed K-core decomposition to examine the community structure.
Applied NLP techniques, including Named Entity Recognition, Sentiment Analysis, and Topic Modeling, on tweets to investigate emotional cascades.
Forecasting Cryptocurrency Prices Using Machine Learning: An Analysis of Reddit Discussions
Code | Website

By leveraging techniques like Exploratory Data Analysis (EDA), Natural Language Processing (NLP), and Machine Learning (ML) on a big data scale, we aim to uncover hidden patterns and trends that can shed light on the realities of investing in the crypto markets.
Developed data pipelines for a dataset of 6 million rows in AWS, leveraging SparkNLP for sentiment analysis and time-series prediction using LSTM, ARIMA, and Synapse ML for causal inference to model the sentiment trends and prices relationship.
Quantifying the Complex Relationship between Lyrics, Chord Progression, and Emotion Stimulation
Paper | Code | Website | Blog

Utilized Bag of Words and Word2Vec methods, combined with KNN/hierarchical clustering, to identify songs with similar lyrics.
Incorporated LDA and sentiment analysis for deeper lyrical understanding. Explored relationships between numerical (tempo, chord progression, key) and textual features using supervised machine learning.
Trained models to understand the connection between elements in music, improving recommendation accuracy using machine learning techniques and textual features.
Building a Song Similarity Calculator to Improve the Recommendation System
Whitepaper | Code | Application

Features an interface for a song similarity calculator that evaluates both lyrical and numerical attributes of songs.
Utilizes cosine similarity for comparison, displaying results through an overlapping radar graph for visual analysis.
Predicting Attitudes Towards UBI in the EU Using Supervised Machine Learning Techniques
Paper | Code | Slides

Built and trained Logistic Regression, Decision Tree, SVM, Random Forest, XGBoost, and GBDT models.
Identified five primary and 35 secondary indicators influencing EU citizens’ attitudes towards UBI.