Student Projects

If you are interested in doing a Master thesis or semester project in data science, you can apply for one of the projects bellow. Your primary supervisors will be Francisco Pinto and Patrick Jermann, from the Center for Digital Education (CEDE).

List of Projects


Implement Gradient Descent Algorithm for Concept Detection

Level: Master

Subject areas: Convex optimisation, natural language processing, graph theory.

Description: The goal of this project is to implement a convex optimisation algorithm that, using course descriptions as a source of data, determines which scientific concepts are taught in EPFL courses. The database of concepts is assumed to exist beforehand in the form of Wikipedia pages. The challenge is how to relate keywords extracted from course descriptions to their most relevant Wikipedia counterparts. This will be done through a mix of natural language processing (NLP) and graph theory.

Pre-requisites: Solid understanding of the math behind convex optimisation and gradient descent; proficiency in Python; knowledge of basic concepts in machine learning; knowledge or interest in natural language processing and graph theory.

Useful tools: Python notebooks, pandas, GloVe, and networkx modules.

Contact: francisco.pinto@epfl.ch
Please attach the grade transcripts from both your Bachelor and Master studies.


Estimate Concept Dependencies in EPFL Courses using Machine Learning

Level: Master

Subject areas: Statistics, machine learning.

Description: The goal of this project is to use statistical tests and machine learning techniques to estimate the impact of learning certain subjects at least one semester before learning others, based on the academic performance of EPFL students. You will use an anonymised dataset of subject scores based on student course choices over the past ~10 years, and experiment with different machine learning algorithms.

Pre-requisites: Proficiency in Python; solid knowledge of linear models, statistical tests, and machine learning techniques.

Useful tools: Python notebooks, pandas, statsmodels, and sklearn modules.

Contact: francisco.pinto@epfl.ch
Please attach the grade transcripts from both your Bachelor and Master studies.


Estimate Distances between Wikipedia Pages using NLP and Graph Processing

Level: Master

Subject areas: Machine learning, natural language processing, graph theory, parallel processing.

Description: The goal of this project is to calculate a variety of metrics on the Wikipedia graph that estimate the similarity and dependency between concepts based on different criteria, using machine learning techniques such as Natural Language Processing (NLP), analysis of clickstream patterns, and basic graph theory. In order to handle the large amounts of data, the algorithms should be implemented in a distributed way, operating on a Hadoop or Spark infrastructure.

Pre-requisites: Proficiency in Python; experience with machine learning algorithms, specifically NLP; experience with parallel processing frameworks such as Hadoop and Spark.

Useful tools: Python notebooks, GloVe embeddings, networkx, Apache GraphX.

Contact: francisco.pinto@epfl.ch
Please attach the grade transcripts from both your Bachelor and Master studies.


Implement a 3D Graph Navigation App in JavaScript

Level: Bachelor or Master

Subject areas: Programming, 3D graphics.

Description: The goal of this project is to implement a webapp that fetches data from a graph database in the backend and displays a 3D rendering of the graph on the frontend. The user will be able to navigate the graph on the three dimensions of space, either using a tablet computer or a VR headset [check out, for example, the WikiGalaxy project].

Pre-requisites: Excellent programming skills, preferable with experience in JavaScript; basic understanding of graphs.

Useful tools: Three.js, Node.js, GraphQL, ArangoDB.

Contact: francisco.pinto@epfl.ch
Please attach the grade transcripts from both your Bachelor and Master studies.