arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Choosing a Project

hashtag
Focus on Problems First, Then Data

Rather than starting with a dataset and trying to find something interesting to do with it, identify a meaningful problem that could benefit from machine learning. This approach tends to lead to more impactful and realistic projects.

hashtag
Look for Human Pain Points

Consider processes or tasks that are:

  • Time-consuming for humans

  • Tedious or repetitive

  • Prone to human error

These often make excellent candidates for machine learning solutions.

hashtag
Project Sources

hashtag
Bring Your Own Project

If you already have a problem in mind or access to interesting data:

  • Present your idea in the course or course chat

  • If you have any doubts about the project, formulate specific questions and ask in the course chat

hashtag
Partner with Organizations

Local companies and academic departments often have real-world problems waiting for ML solutions:

  • Reach out to businesses or research groups in your area

  • Inquire about data-intensive problems they're facing

  • We can help you define project parameters with your partner

hashtag
Public Datasets

While convenient, using public datasets sometimes comes with limitations:

  • Projects tend to focus more on model optimization than real-world implementation

  • You miss valuable experience in data preparation and feature engineering

  • The problem may be artificially clean compared to real-world scenarios

However, if you choose this route, consider adding complexity by:

  • Combining multiple datasets

  • Creating your own validation methodology

  • Adding constraints that reflect real-world conditions

hashtag
Data Resources

If you're looking for public datasets, here are some valuable repositories:

  • : Datasets from real-world challenges

  • : Collections ready for NLP and other ML tasks

  • : Classic, well-documented datasets

hashtag
Examples of Past Projects

To get inspiration for a project, you might also want to review the projects nominated for the past VDE Machine Learning Prize in 2025 presented in the document below.

hashtag
Non-Disclosure Agreement (NDA)

If you need an NDA for data you are getting from an organization or partner, you can use the following:

If data confidentiality is a concern, you can use the NDA template provided
: Competitive datasets with community solutions
  • : Computer vision datasets

  • : Datasets linked to research papers

  • ChallengeDataarrow-up-right
    Hugging Face Datasetsarrow-up-right
    UCI Machine Learning Repositoryarrow-up-right
    file-pdf
    782KB
    VDE Machine Learning Prize 2025.pdf
    PDF
    arrow-up-right-from-squareOpen
    file-pdf
    103KB
    NDA_Projects_engl_v0.2.pdf
    PDF
    arrow-up-right-from-squareOpen
    Non-Disclosure Agreement (NDA)
    here
    Kagglearrow-up-right
    Roboflow Universearrow-up-right
    Papers with Codearrow-up-right