Choosing a Project
Last updated
Was this helpful?
Last updated
Was this helpful?
Was this helpful?
Rather than starting with a dataset and trying to find something interesting to do with it, identify a meaningful problem that could benefit from machine learning. This approach tends to lead to more impactful and realistic projects.
Consider processes or tasks that are:
Time-consuming for humans
Tedious or repetitive
Prone to human error
These often make excellent candidates for machine learning solutions.
If you already have a problem in mind or access to interesting data:
Present your idea in the course or course chat
If you have any doubts about the project, formulate specific questions and ask in the course chat
Local companies and academic departments often have real-world problems waiting for ML solutions:
Reach out to businesses or research groups in your area
Inquire about data-intensive problems they're facing
We can help you define project parameters with your partner
If data confidentiality is a concern, you can use the NDA template provided here
While convenient, using public datasets sometimes comes with limitations:
Projects tend to focus more on model optimization than real-world implementation
You miss valuable experience in data preparation and feature engineering
The problem may be artificially clean compared to real-world scenarios
However, if you choose this route, consider adding complexity by:
Combining multiple datasets
Creating your own validation methodology
Adding constraints that reflect real-world conditions
If you're looking for public datasets, here are some valuable repositories:
ChallengeData: Datasets from real-world challenges
Hugging Face Datasets: Collections ready for NLP and other ML tasks
UCI Machine Learning Repository: Classic, well-documented datasets
Kaggle: Competitive datasets with community solutions
: Computer vision datasets
: Datasets linked to research papers
To get inspiration for a project, you might also want to review the projects nominated for the past VDE Machine Learning Prize in 2025 presented in the document below.
If you need an NDA for data you are getting from an organization or partner, you can use the following: