Everyone talks about LLM models, but do you understand how tokenizer models are built:
Tokenizer are behind every LLM model, if you care about DL, you should go check.
I just created a step-by-step guide:
Normalization & pre-tokenization
BPE, WordPiece algo
i will continue to maintain , and cover other NLP fundamentals.