LLM training in simple, raw C/CUDA