BriefMe - News Summarization
A Web Application that summarizes articles
With the explosion of information in modern days, it is hard for people to stay on top of important topics without feeling overwhelmed by the process of news consumption.
​
As a result, BriefMe was found to alleviate the constraints of time and energy spent browsing news, by attaining articles on conventional topics from a multitude of credible sources, and applying the Transformers model to summarize the articles into short and concise paragraphs.
We developed Python scripts using Pycharm to scrape articles from news outlets, then applied BART to summarize them, before saving the article's information on MongoAtlas for later access from the web application. The project has been deployed on heroku for public access.
​
Challenges and Potential Considerations:
-
While trying to re-train the pre-trained BART model, we encountered the limited processing power issue, even though we succeeded in applying ohmeow-blurr library to construct BartForConditionalGeneration object to train the fraction of cnn_dailymail dataset. The ideal solution would be using a customized dataset with articles and their corresponding summaries in order to retrain BART with better computational power machine.
-
Another consideration during the research for deployment the Pycharm project involved running an EC2 instance on AWS indefinitely so that the website would always display the latest articles. Nevertheless, the free-tier machine could not handle the task because of the overdemanding processing. Therefore, we think the best application will achieve the desired output if it can depend on Big Data tools like Hadoop, Sparks.
​
Technology highlights:
-
Summarization task using BART Transformer
-
Scraping Websites using Beautifulsoup, requests-html
​
For more information, please visit the website.
​


Customer Churn Prediction
A machine learning model for classification problem
Customer attention is essential for many businesses, especially in the era in which many options are offered to them at competitive prices. Therefore, it is important to identify customer groups based on their characters and behaviors, so that necessary marketing and CRM efforts can be diverted in a targeted and purposeful manner to maximize efficiency.
In this analysis, a small dataset of churned customers for a telecom company with different collected features will be analyzed, in order to build a predictive model that can help early detection of customers who are likely to stop using the service.
​
The process includes an initial EDA with seaborn to understand some characteristics of the dataset, leading to the resampling step using RandomOverSampler. In this analysis, a number of feature selection methods has been applied and compared together in order to select the best set of features, which is then used to train the predictive models. In order to maximize the accuracy, hyperparameters tuning with cross-validation has been utilized with various machine learning models, ensemble methods, and Multilayer Perceptron Neural Network to determine the best-performing predictions.
​
Technology Highlights:
-
EDA: seaborn
-
Feature Selection
-
Ensemble Method
-
Artificial Neural Network
​

How tweets affect #cryptocurrencies prices
A machine learning project that solves a regression problem with sentiment analysis
One of the most popular buzzwords in the recent era is cryptocurrencies. The dramatic fluctuation in prices in some e-coins has drawn many enthusiasts to scrutinize and discuss eagerly across social media platforms.
In this project, we examined the relationships between Tweets from Twitter and the prices of 4 interested e-coins, and trained a machine learning model that could predict the daily prices based on the sentiment scores obtained.
​
Challenges
-
Coingecko only provides free access to e-coins' daily prices. Due to the recent births of all the coins (the first e-coin was released in 2010), we cannot rely on a single coin's daily data to build a dataset.
​
Future Considerations
-
To alleviate the issues above, we collected four coins' prices in order to build a robust model for prediction, understanding that the discrepancies between prices will create noises in the model.
-
An ideal approach would be obtaining hourly prices of individual coins which can be used to construct a more accurate predictive model.
​
Technology Highlights:
-
Sentiment Analysis
-
Machine Learning models
​
To find out more, please visit the Google Colab notebook or the GitHub repo.

Movie Recommendation System
A robust recommending solution using text and rating
Learning users' behaviors through their interactions with the product can be applied to promote relevant items strategically. The project aims to build a multitude of recommending methods that take advantage of a variety of available features, using Content-based, Collaborative filtering and hybrid options to maximize the accuracy.
​
Challenges and Potentials Considerations:
-
Even though we are able to generate lists of similar items, we want to explore more methods to provide even better recommendations that can differentiate itself in the market.
​
Technology Highlights:
-
Text Processing with TF-IDF
-
Machine Learning models
-
Statistical Methods: Jaccard similarity, Cosine similarity
​
To find out more, please visit the Google Colab notebook.

Database Development
A detailed development of a database system using SQL
A comprehensive project that illustrates step-by-step process to build a relational database for a hiking company.
The steps taken include
-
analyzing user requirements
-
identifying entities and relationships with assumptions about cardinality and participation
-
creating EER models
-
creating relational schema
-
normalization
At the end, we include CRUD SQL queries for illustrating purposes, and an additional 10 business queries to generate reports.
Technology Highlights:
-
Advanced SQL queries
-
ER, EER modeling
-
Data Normalization
​
To find out more, please visit GitHub repo.