Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Truncating a UTF-8 codepoint is not fine because most software is not tested with partially broken UTF-8 so international users will likely run into many bugs.

Especially because concatenation is a very common operation so those sliced codepoints will be everywhere, including in the middle of text.



Morally I view “what do I do with my truncated string” to be a separate issue from “how do I truncate the string” as described in the article. Like, yes, you absolutely should not concatenate after doing this operation. But maybe you shouldn’t be showing the user a truncated string either even if it’s all ASCII. The question of “did you make an unparseable UTF-8 string” is answered with “no” and the more complicated but also more interesting question of “did you actually want this” remains unanswered.


This is fair, the article takes truncating a string to fit in a status bar as an example.


Also consider Unicode is not only international characters, but superscripts and other stuff ♥ᵃ

a: there was a list somewhere over which characters hackernews allows?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: