Emojis, shortcodes and zero width joiners
This site uses the goldmark markdown emoji extension to render the markdown files into html. The goldmark markdown processor uses Github’s markdown shortcodes to render emojis in the markdown files. Unfortunately, the shortcodes provided by Github are out of date and don’t include all the emojis that unicode has to offer. Naturally, I wanted to have them all, so I set out to add the full list of unicode emojis and their shortcodes to my site.
Goldmark Markdown Processor
Thankfully, the goldmark markdown processor is highly extensible and allows for custom extensions to be added. Even better, the goldmark-emoji extension allows for custom emoji shortcodes to be added on initialisation.
1 2 3 |
|
Unicode Emojis
Unicode has a wide range of emojis that have been in the standard since Unicode v6.0. Unicode emojis are not as simple as other unicode characters, as they can be composed of multiple unicode code points. Some emojis first proposed in Technical Standard (#51-R2) and introduced in Emoji 2.0 (Unicode v8.0) are made up of multiple emojis joined together using the Zero Width Joiner (ZWJ) character. This allows for a wide range of emojis to be created by combining existing emojis in different ways as well as providing a way to easily add variations to existing emojis.
Emojis in Go
In Go, strings are made up of runes, each rune representing an individual unicode code point. This means that for glyphs that are composed of multiple unicode code points (like joined emojis), some care needs to be taken when parsing and manipulating them.
The Go blog has a great post on strings that goes into more detail on how strings are handled and why runes are preferred over characters.
Example
1 2 |
|
The 0x200D
rune is the Zero Width Joiner that combines the 🧑 :adult:
and 💻 :laptop:
emojis to create the 🧑💻 :technologist:
emoji.
Parsing Hex Strings to Runes
The Unicode standard emoji list represents emojis as groups of hex strings which can be parsed to runes using the strconv
package in Go.
This is because the rune type is just an alias for the 32-bit integer type and is equivalent to it in all ways.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Outputting Runes as Strings
To output runes as a string, you can use the string
type conversion to convert a slice of runes to a string.
1 2 3 4 |
|
Results
For my own convenience, I’ve added a page that lists all the emojis, their Github shortcodes, and the custom shortcodes I’ve added for them (based on tehir unicode name). You can find it here if you’re curious.