Language Model From Scratch
A language model I created from scratch including the BPE tokenizer, transformer architecture, fused kernels using Triton, distributed computation, optimizer sharding, experimentally derived scaling laws, data curation from common crawl, and alignment using RLHF with DPO.
June 17, 2024
There were five parts to creating a language model from scratch.
Here is the documentation of the 5 parts:
- [Architecture](/projects/lm-from-scratch/architecture.pdf)
- [Systems](/projects/lm-from-scratch/systems.pdf)
- [Scaling](/projects/lm-from-scratch/scaling.pdf)
- [Data](/projects/lm-from-scratch/data.pdf)
- [Alignment](/projects/lm-from-scratch/alignment.pdf)
And here are all of the code repositories for each part:
- [Architecture](https://github.com/matttreed/language-model)
- [Systems](https://github.com/matttreed/language-model-systems)
- [Scaling](https://github.com/matttreed/language-model-scaling)
- [Data](https://github.com/matttreed/language-model-data)
- [Alignment](https://github.com/matttreed/language-model-alignment)