MSc Project Ideas
Please get in touch if you have any questions about specifics of the projects.
A Slurm Web Interface and API Wrapper
Web application development, Programming
A good knowledge of Linux is required for this project.
In the Computer Vision Lab we have a GPU cluster consisting of more than 50 GPUs. It is used by members of the lab, as well as collaborators and undergraduates working with lab members.
We currently manage user associations with a shell script and org-mode table.
The requirements include:
- A way of querying and updating information from the Slurm database. This could work by making calls to the
sacctmgrprogram, but ideally would be a wrapper for the Slurm C API, in which ever language you choose to use.
- Using the wrapper, build a web application which is able to do the following:
- Manage user associations, which users can submit jobs, which billing account they belong to and which QoS (see below) the users have access to.
- Manage Quality of Services (QoS). In Slurm, a QoS manages access to a given partition of the cluster, as well as the amount of resources allowed.
- Node control, such as bringing back up nodes after recovery, "draining" or "downing" a node.
- A form for making account requests. Submissions will be sent to a list of email addresses and include a link to accept or reject the request. Acceptance will automatically generate the Slurm association.
There is room for extensions if you'd like to add them. There may be room to open source this project.
Ideally development would be done in Node.js, PHP or Java.
Distributed Data Augmentation for Deep Learning using MPI
Distributed Computing, Deep Learning
An understanding of deep learning will be beneficial in this project.
One of the difficulties with deep learning is augmenting data during the training of deep convolutional neural networks. This means applying random transformations to the input data to make it look different some how. Occasionally, this also requires making changes to the output too. Data augmentation helps prevent a network from learning the training set directly, making it better able to generalise to new, unseen images during inference.
Data augmentation frequently becomes the bottleneck during training, often requiring many CPU cores. I would like to investigate ways of improving the performance of this workload by distributing the data augmentation problem across compute nodes using the Message Passing Inference (MPI).
MPI is a very standard and simple to use library for exchanging messages and data between compute nodes.
This work will be implemented as a Python library for the PyTorch framework. MPI is accessible through Python using mpi4py.
There is potential to analyse the performance of large batch sizes with and without MPI data augmentation.
Personal Handwriting OCR
Deep Learning, Computer Vision
OCR of clearly printed text is generally considered to be a solved problem. Handwriting poses many more challenges due to variations in style between individuals. However, building a system which is able to convert your own hand writing to text is within the realms of possibility.
While you are encouraged to experiment and develop your own approach, the project will likely need to be broken into several parts:
- Data generation and annotation - much of this can actually be automated, but be prepared to write the alphabet a few times.
- Character detection, so you can perform classification on that region.
- Character recognition