The Small Language Mannequin from Microsoft known as Phi-3 was educated on utilizing a novel dataset known as TinyStories
The Small Language Mannequin (SLM) Phi-3 was educated on artificial knowledge generated by GPT-3.5 and GPT-4.
Typically LLM created coaching knowledge could be too repetitive and comparable with out range in verbs, nouns and adjectives.
The corpus wanted to mix all of the qualitative components present in pure language, corresponding to grammar, vocabulary, details, and reasoning.
However, designed to be smaller, much less various, and extra restricted when it comes to its content material.
The precept of making a framework or knowledge topology for the LLM to create the artificial coaching knowledge I discover fascinating.
The research exhibits that the coaching of generative fashions on TinyStories can usually be carried out in lower than a day on a single GPU. And nonetheless exhibit many behaviours much like those noticed in LLMs.
As an alternative of coaching on simply uncooked internet knowledge, the creators of Phi-3 seemed for prime quality knowledge.
Microsoft researchers determined to create a discrete dataset beginning, and basing the coaching knowledge on 3,000 phrases — comprising of roughly equal variety of nouns, verbs, and adjectives.
They then requested a massive language mannequin to create a youngsters’s story utilizing one noun, one verb, and one adjective from the record — a immediate repeated tens of millions of occasions over a number of days, producing tens of millions of tiny youngsters’s tales.