Student Projects

If you are interested in doing a Master thesis or semester project in data science, you can apply for one of the projects bellow. Your primary supervisor will be Francisco Pinto, from the Center for Digital Education, and your professor will be either Patrick Jermann or Pierre Dillenbourg.

List of Projects


Predict and Recommend Research Collaborations using Graph Analysis

Level: Master

Subject areas: Machine learning, graph theory

Description: One of the obstacles to scientific collaboration in a research institution like EPFL is the lack of awareness among researchers of what other people are doing in different labs. With better awareness of each others’ work, people working on related problems could more often collaborate and co-author papers. The goal of this project is to analyse the EPFL publications graph and estimate potential collaborations, by strategically connecting nodes in the graph. Using link prediction techniques, the algorithm should be capable of recommending new collaborations (or connections) to individual researchers.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in graph theory.

Useful tools: Python notebooks, pandas, sklearn, and networkx modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Create a Concepts Similarity Graph using Wikipedia and Graph Analysis

Level: Master

Subject areas: Machine learning, graph theory

Description: One of the objectives of the Campus Analytics project is to construct a database of scientific concepts that are taught at EPFL, using Wikipedia as a source of data. The Wikipedia foundation provides a massive data dump that contains all content and hyperlinks of every article on their website, which allows us to create a connectivity graph between scientific subjects. The problem with this graph (and Wikipedia in general) is that it has too many “weasel connections” between articles, making it far more connected than it should be in reality. For example, the distance between Robin Hood and the Fubini theorem is only 3 degrees. The goal of this project is to construct a more realistic similarity graph based on global graph properties, by experimenting with different graph processing and machine learning algorithms.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in graph theory.

Useful tools: Python notebooks, pandas, sklearn, and networkx modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Predict Trends in Startup Investment using Twitter and Machine Learning

Level: Master

Subject areas: Machine learning, data mining, natural language processing, and graph theory

Description: Creating a startup company is the dream of many students at EPFL, particularly those who complete a Ph.D. having developed an innovative technology. As a research institution that cares about tech-transfer, EPFL has an interest in investing in new domains of research that will potentially have a commercial impact in the near future. For example, areas such as crypto-currencies, self-driving cars, and single-cell DNA sequencing are currently hot investment areas, but may not be so in the timespan of 5 to 10 years. The goal of this project is to predict investment trends by scrapping and analysing tweets from prominent VC investors, like Marc Andreessen, Peter Thiel, and Paul Graham. This is a good project for students who wish to learn the multiple facets of the data science stack, including data mining, natural language processing, and graph analysis.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in data mining, natural language processing, and graph theory.

Useful tools: Python notebooks, pandas, sklearn, fasttext, and networkx modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Extract Scientific Concepts from EPFL Courses using Natural Language Processing

Level: Master

Subject areas: Machine learning, natural language processing

Description: The goal of this project is to create a database of scientific concepts that are taught at EPFL, and determine which courses connect to which concepts based on their descriptions. The database of concepts is assumed to exist beforehand in the form of Wikipedia articles. The challenge is how to relate keywords extracted from course descriptions to their most relevant Wikipedia counterparts. This will be done through the use of natural language processing (NLP) techniques, such as topic modelling and word embeddings.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in natural language processing.

Useful tools: Python notebooks, pandas, fasttext, and other word embedding modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Extract Scientific Concepts from EPFL Publications using Natural Language Processing

Level: Master

Subject areas: Machine learning, natural language processing

Description: The goal of this project is to create a database of scientific concepts that are addressed in articles published by EPFL researchers, and determine which publications connect to which concepts based on their abstract, keywords, and content. The database of concepts is assumed to exist beforehand in the form of Wikipedia articles. The challenge is how to relate keywords extracted from publications to their most relevant Wikipedia counterparts. This will be done through the use of natural language processing (NLP) techniques, such as topic modelling and word embeddings.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in natural language processing.

Useful tools: Python notebooks, pandas, fasttext, and other word embedding modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Extract Scientific Concepts from EPFL MOOCs using Natural Language Processing

Level: Master

Subject areas: Machine learning, natural language processing

Description: The goal of this project is to create a database of scientific concepts that are taught in EPFL MOOCs, and determine which MOOCs connect to which concepts based on their video transcriptions (subtitles). The database of concepts is assumed to exist beforehand in the form of Wikipedia articles. The challenge is how to relate keywords extracted from MOOC subtitles to their most relevant Wikipedia counterparts. This will be done through the use of natural language processing (NLP) techniques, such as topic modelling and word embeddings.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in natural language processing.

Useful tools: Python notebooks, pandas, fasttext, and other word embedding modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Predict Co-Authorship in EPFL Publications using Natural Language Processing

Level: Master

Subject areas: Machine learning, natural language processing

Description: The Infoscience repository contains a big collection of papers published by EPFL researchers. Using this dataset, we can determine which researchers collaborated with each other through the co-authoring of papers. The goal of this project is to predict those collaborations based solely on text analysis of the paper’s content, and evaluate (and maximise) the algorithm’s reliability with respect to the ground-truth.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning, knowledge or interest in data mining and natural language processing.

Useful tools: Python notebooks, pandas, fasttext, and other word embedding modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Create Course Recommender System for EPFL Students using Machine Learning

Level: Bachelor or Master

Subject areas: Machine learning, neural networks

Description: The goal of this project is to use machine learning techniques to recommend courses to EPFL students based on their past choices. You will use a dataset with student course choices for the last ~10 years, and compare different feature selection / ML algorithms according to their predictive power.

Pre-requisites: Proficiency in Python, knowledge of basic concepts in machine learning.

Useful tools: Python notebooks, pandas, and sklearn modules.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)


Create Interactive Visualisation of Research Collaborations Graph

Level: Bachelor or Master

Subject areas: Data visualisation

Description: In this project, you will design a JavaScript interactive visualisation of a co-authorship network, representing collaborations between EPFL researchers. The visualisation is in the form of a graph that the end-user can interact with (select, highlight, move around, and filter), similar to this. This project will be done in collaboration with the Interfaculty Institute of Bioengineering (IBI).

Pre-requisites: Experience with JavaScript, knowledge or interest in data visualisation and design.

Useful tools: JavaScript, D3, Raphael, and sigma.js.

Contact: francisco.pinto@epfl.ch (Please include your grade transcripts in the application)