# Choosing a Project

### Focus on Problems First, Then Data

Rather than starting with a dataset and trying to find something interesting to do with it, identify a meaningful problem that could benefit from machine learning. This approach tends to lead to more impactful and realistic projects.

### Look for Human Pain Points

Consider processes or tasks that are:

* Time-consuming for humans
* Tedious or repetitive
* Prone to human error

These often make excellent candidates for machine learning solutions.

### Project Sources

#### Bring Your Own Project

If you already have a problem in mind or access to interesting data:

* Present your idea in the course or course chat
* If you have any doubts about the project, formulate specific questions and ask in the course chat

#### Partner with Organizations

Local companies and academic departments often have real-world problems waiting for ML solutions:

* Reach out to businesses or research groups in your area
* Inquire about data-intensive problems they're facing
* We can help you define project parameters with your partner
* If data confidentiality is a concern, you can use the NDA template provided [here](#non-disclosure-agreement-nda)

#### Public Datasets

While convenient, using public datasets sometimes comes with limitations:

* Projects tend to focus more on model optimization than real-world implementation
* You miss valuable experience in data preparation and feature engineering
* The problem may be artificially clean compared to real-world scenarios

However, if you choose this route, consider adding complexity by:

* Combining multiple datasets
* Creating your own validation methodology
* Adding constraints that reflect real-world conditions

### Data Resources

If you're looking for public datasets, here are some valuable repositories:

* [**ChallengeData**](https://challengedata.ens.fr/challenges/challenges_search): Datasets from real-world challenges
* [**Hugging Face Datasets**](https://huggingface.co/datasets): Collections ready for NLP and other ML tasks
* [**UCI Machine Learning Repository**](https://archive.ics.uci.edu/): Classic, well-documented datasets
* [**Kaggle**](https://www.kaggle.com/datasets): Competitive datasets with community solutions
* [**Roboflow Universe**](https://universe.roboflow.com/): Computer vision datasets
* [**Papers with Code**](https://paperswithcode.com/datasets): Datasets linked to research papers

### Examples of Past Projects

To get inspiration for a project, you might also want to review the projects nominated for the past VDE Machine Learning Prize in 2025 presented in the document below.

{% file src="/files/0IBsXhR4xPIvITTJrxEn" %}

### Non-Disclosure Agreement (NDA)

If you need an NDA for data you are getting from an organization or partner, you can use the following:

{% file src="/files/-MOXBtzubKR3rCJmwf6R" %}
Non-Disclosure Agreement (NDA)
{% endfile %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://opencampus.gitbook.io/opencampus-machine-learning-program/course-projects/possible-projects.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
