Choosing a Project

Focus on Problems First, Then Data

Rather than starting with a dataset and trying to find something interesting to do with it, identify a meaningful problem that could benefit from machine learning. This approach tends to lead to more impactful and realistic projects.

Look for Human Pain Points

Consider processes or tasks that are:

  • Time-consuming for humans

  • Tedious or repetitive

  • Prone to human error

These often make excellent candidates for machine learning solutions.

Project Sources

Bring Your Own Project

If you already have a problem in mind or access to interesting data:

  • Present your idea in the course or course chat

  • If you have any doubts about the project, formulate specific questions and ask in the course chat

Partner with Organizations

Local companies and academic departments often have real-world problems waiting for ML solutions:

  • Reach out to businesses or research groups in your area

  • Inquire about data-intensive problems they're facing

  • We can help you define project parameters with your partner

  • If data confidentiality is a concern, you can use the NDA template provided here

Public Datasets

While convenient, using public datasets sometimes comes with limitations:

  • Projects tend to focus more on model optimization than real-world implementation

  • You miss valuable experience in data preparation and feature engineering

  • The problem may be artificially clean compared to real-world scenarios

However, if you choose this route, consider adding complexity by:

  • Combining multiple datasets

  • Creating your own validation methodology

  • Adding constraints that reflect real-world conditions

Data Resources

If you're looking for public datasets, here are some valuable repositories:

Examples of Past Projects

To get inspiration for a project, you might also want to review the projects nominated for the past VDE Machine Learning Prize in 2025 presented in the document below.

Non-Disclosure Agreement (NDA)

If you need an NDA for data you are getting from an organization or partner, you can use the following:

Non-Disclosure Agreement (NDA)

Last updated

Was this helpful?