Possible Projects
There are different options for you to define or select your course project:
Bring your own data and project idea to the course. Simply talk to your course lead about your idea and the goal of the project until the end of the semester.
Choose a project from the list of current projects provided in the table at the end of this page.
Talk to local companies or chairs at your local higher education institutions if they are interested in a machine learning protoytpe for some of their production or research tasks and would like to share the corresponding data. If you find a partner that would be interested in such a project, we will be happy to support you in the definition of the project together with the partner and also, for example, with setting up a non-disclosure agreement for the provided data.
Look for an interesting dataset on the Internet and define yourself a project based on this dataset. However, we would very much recommend you to choose one of the before mentioned options. With datasets from the Interenet (e.g. from Kaggle competitions) your main challenge is typically limited to optimizing the model with an already prepared dataset. However, in practice the challenge is more often to construct the right training and validation datasets and construct the right features.
General Comments
For a text classification task usually a few hundred labeled cases are already sufficient.
Daily sales or usage data is also always interesting, you can then try to predict solely based on the given characteristics of a day and the sales before this day (which week of the day, beginning/end of the month, during holidays, sales on the same day a week earlier, sales on the day before, and many more). Minimum for such time series analyses is around 1000 cases (i.e. about 3 years).
Considering the work with images it is also an option for a project to take a set of a maybe just 100 unlabeled images with similar objects and generate new images from these using a Generative Adversarial Network (GAN).
Data Resources
A good resource for datasets is Papers with Code,
another one is Hugging Face,
the UCI Macine Learning Repository is well knon,
and also Kaggle.
Project Examples
Here some examples. If you are interested in seeing the past projects, check out them in the project page
Title | Description | Dataset |
Surf Forecast | On a good surfing day for a particular surf spot, the number of page-views on the site with the forecasts for that spot usually increases. The number of page-views shall be used as a proxy for the quality of the surfing day, in order to improve the forecast of a good day. Thus the locally very different weather influences on a spot, such as thermals, wind direction, cloudiness etc., which are usually best known to the users, shall be considered for the forecast of good windsurfing days. | Weather station data of 7 popular surf spots (Kiel Lighthouse, Skt. Peter-Ording, Warnemünde, Port Said Airport, Molasses Reef, Renesse West, Hvide Sande) for the years 2016, 2017, 2018 and 2019 with hourly measured data. For this datset it is necessary to sign the NDA that is linked below the table. |
Prediction of Bike Rentals | In order to always have sufficient bikes at all stations, it is essential to know as earliy as possible when the rental stations won't have any bikes any more (and possibly when stations will be overloaded with bikes) to organize a corresponding transport of bikes. The goal would therefore be to predict for each day, at which time stations will be empty. | Dataset of ~100.000 bike rentals from Sprottenflotte, a bike rental service in Kiel, including the following variables: time of rental, time of return, station number of rental, station number of return, station name of rental, station name of return. For this datset it is necessary to sign the NDA that is linked below the table. |
COVID 19 Prediction | The goal is to estimate the number of daily new cases and daily deaths on the level of the German states based on the daily statistics published by the RKI or alternatively by the international COVID-19 Data Hub. | |
Predict Bakery Turnover During Corona Pandemic | In normal times meteolytix builds a complex Machine Learning model containing over 400 features to predict the customers’ turnover. But there’ isn’t a “normal” during the pandemic, sales can change week to week. Therefor a smaller model could help to predict the sales but have you thought about a naïve logic, too? (Data is available until June 2020) | Daily turnover for a bakery branch in Kiel. The turnover is splitted into 5 commodity groups (bread, rolls, croissants, pastry, cake, seasonal bread) for the years 2016, 2017, 2018, 2019 and 2020. |
Generation of Scientific Text | Language Models such as the ones included in the GPT-2 model allow generating text that often is not distinguishable any more from text written by human beings. The idea of this project is to fine tune a given language model using the EconStore fulltext papers in order to generate scientific text for a provided text title and maybe some initial words for the text. The final goal would be to produce a text that could be confounded with an actual scientific text. | 7700 fulltext papers from EconStore, a subject based repository for literature in economics research. |
Non-Disclosure Agreement
For some of the projects listed above it is necessary to sign the following NDA to get access to the corresponding data:
Last updated