Week 5 - The Datasets Library

This week you will...

  • insights on the benefits of the Hugging Face Datasets library
  • discuss the literature review on project tasks
  • get some ideas on how to visualize sequence data

Learning Resources

221123_The Datasets Library.pdf
3MB
PDF

Until next class you should...

  • complete chapter 6 (The Tokenizers Library) of the Hugging Face course
  • look into the characteristics of you dataset and:
    • write down the specifics of how your data was collected
    • create filter variables to group your input data according to special characteristics
    • consider the following questions:
      • What are potential biases in your training data?
      • Are there outliers in the dataset?
      • Are the classes balanced? (If you deal with a classification task.)