News

doing tasks such as music and image generation or hobbyist science. Distillation works by using the larger teacher model to generate outputs that the student model then learns to mimic.