If we don't fix up issues caused by the tokenizers, than techniques which litera...

If we don't fix up issues caused by the tokenizers, than techniques which literally remove superfluous computation (i.e. through filters of the LLM probability distribution) are useful as a stop-gap.

Switching to bytes is the ultimate fix, but for the interim, if you want reliable rhyming with an LLM, you need filter-assisted decoding: https://paperswithcode.com/paper/most-language-models-can-be... and replicas post about this work: https://replicate.com/blog/turn-your-llm-into-a-poet