Handmade Datasets: Strategies for working creatively with small data and Artificial Intelligence
Demystify machine learning and exercise agency over emerging tech by building handmade datasets to train custom image, text, and audio models.
This beginner-friendly intensive course demystifies the process of training Artificial Intelligence systems with machine learning by working with small/human-scale, personally assembled datasets. Students will learn practical techniques for training text, image, and audio models with limited data including data augmentation, transfer learning, and LoRA fine-tuning for images, Retrieval-Augmented Generation (RAG) for text, and working with RAVE for audio synthesis. The course progresses through three units: first visual material with GANs and LoRA, then text with Ollama and RAG, and finally audio with RAVE. Students will experiment with all three approaches before choosing one to develop more for their final project.
Beyond technical skills, this course examines where training data comes from and how dataset composition influences model outputs. Through hands-on projects and critical discussion, students will develop both the technical capability to work with small-scale ML and a more nuanced understanding of data collection and the labor behind AI systems. By the end of the intensive, participants will have created their own ‘handmade dataset’ and trained a custom model.
This course welcomes anyone interested in how AI systems work and in exercising more agency over these emerging technologies.
This Course has limited capacity. Enroll today to secure your spot.

Course Outline:
Students will complete weekly assignments to gradually build text, image, and audio models, expanding on one to present as a final project. Participants can expect to spend 3–4 hours weekly on work outside of class.
Week 1: Introduction & Landscape
Week 2: Dataset Collection Strategies
Week 3: Dataset Preparation & Augmentation
Week 4: Introduction to Image Models & GANs
Week 5: Fine-Tuning Images with LoRA
Week 6: Text Models with Ollama
Week 7: RAG (Retrieval-Augmented Generation)
Week 8: RAVE (Realtime Audio Variational autoEncoder )
Week 9: Guest Lecture & open review
Week 10: Final Project Development & Presentations
Prerequisites:
No prior machine learning or advanced coding experience required. Students should be comfortable with basic computer literacy (file management, running applications) and have a willingness to experiment with new tools. We will focus on understanding concepts, collecting meaningful data, and remixing existing code rather than writing algorithms from scratch. An interest in questions around data ethics, personal archives, or critical approaches to technology is encouraged.
Experience Level: Beginner
Educational Goals:
By the end of this intensive, students will:
- Connect with others exploring small-scale, intentional approaches to ML and build a supportive network for continued experimentation
- Understand and apply practical strategies for working with small datasets across three units: images (GANs, StyleGAN, LoRA), text (RAG with Ollama), and audio (RAVE)
- Gain hands-on experience training models
- Practice methods for collecting, organizing, and augmenting personal datasets; understand when to use data augmentation vs. transfer learning vs. training from scratch
- Develop insight into where training data comes from (ImageNet, LAION-5B, FFHQ), how dataset composition influences model behavior, and the hidden labor behind large-scale AI
- Create a personally meaningful handmade dataset and train a model
- Build capacity to consider issues of consent, ownership, bias, and stewardship when working with data and ML
- Troubleshoot common technical challenges (mode collapse, computational limits, dataset balance) and make informed decisions about model selection based on available resources
- Connect with others exploring small-scale, intentional approaches to ML and build a supportive network for continued experimentation
Course Logistics:
Dates: February 23 – April 27, 2026
Enrollment Deadline: February 13, 2026
Class meets once a week on Mondays.
Times: 4 – 7 PM PT | 7 – 10 PM ET
Cost:
- $1,500 for Live Access
- $750 for Audit Access (weekly recording access, released after each session)
- Payment plans available: 3 monthly installments. Email [email protected] for more information.
Scholarship: We also offer Diversity Scholarships.
Apply by February 13, 2026. Scholarship notifications will be sent within 1 week after the deadline.
About Technologies:
StyleGAN (Style-based Generative Adversarial Network): A neural network architecture developed by NVIDIA for generating high-quality synthetic images. StyleGAN learns patterns from a dataset of images and can create new images that emulate characteristics of the training data. It’s particularly suited for smaller datasets compared to newer alternatives like diffusion models.
LoRA (Low-Rank Adaptation): An efficient fine-tuning technique that allows you to customize pre-trained models with small datasets while using minimal computational resources. Instead of retraining an entire model, LoRA modifies only a small subset of parameters, making it ideal for personal ML projects.
RAVE (Realtime Audio Variational autoEncoder): A neural audio synthesis tool that can learn the characteristics of audio recordings and generate new sounds in real-time. RAVE is designed for creative sound design and can be trained on small collections of audio to create unique timbres and textures.
RAG (Retrieval-Augmented Generation): A technique that enhances language model outputs by dynamically retrieving relevant information from a custom dataset. Rather than fully training or fine-tuning a model, RAG allows you to incorporate your own data into model responses in real-time.
Course Access:
All sessions are held online via Zoom. Unlimited access to the full class recording is available to all enrolled students. Whether you couldn’t make it to class or want to refresh on some of the concepts, Gray Area will provide all enrolled students with a direct link.
Please email [email protected] with any questions.
