GPT-2 clone

The embedded chatbot is a clone of the 124M parameter GPT-2 model, coded up and trained from scratch to surpass the performance of the original model on the HellaSwag dataset. The model is pretraining on a single A100 for 10 billion tokens on the FineWeb-Edu dataset. It is then fine-tuned on the ultrachat_200k dataset to align with the user-assistant conversational style. The A100 is an reasonably priced and readily available accelerator, allowing for this model to be trained on a total compute budget of a modest €13.94 for the entire training. There is a demo of the model in text generation mode on the "method" tab.

The aim of this demo is to both showcase model building and training ("method" tab) as well as model deployment, platform and CI/CD ("backend" tab). All the code is hosted on GitLab for it's CI/CD capabilities, just send me an email for access.

Limitations

The model seems to do particularly badly when given short and open-ended prompts, e.g. "Hello" seems to completely derail it. This may be due to limitations in the dataset, or due to the model's small size. Try specific questions or statements and use the retry button under an answer to see more variants. It also has a propensity for generating very long numbered lists and rambling responses. Each answer has been soft capped at 128 tokens (stopped at newline). If an answer seems cut off, it may very well be.

The model suffers from the same limitations as other language models, including dataset bias. This model has not been trained for any sort of alignment beside the initial pretraining and fine-tuning, and may produce nonsensical or offensive text, exert biases, or generate incorrect information.