A collection of emojis arranged in a spiral pattern.

Emojis, shortcodes and zero width joiners

This site uses the goldmark markdown emoji extension to render the markdown files into html. The goldmark markdown processor uses Github’s markdown shortcodes to render emojis in the markdown files. Unfortunately, the shortcodes provided by Github are out of date and don’t include all the emojis that unicode has to offer. Naturally, I wanted to have them all, so I set out to add the full list of unicode emojis and their shortcodes to my site.

Goldmark Markdown Processor

Thankfully, the goldmark markdown processor is highly extensible and allows for custom extensions to be added. Even better, the goldmark-emoji extension allows for custom emoji shortcodes to be added on initialisation.

1
2
3
// import "github.com/yuin/goldmark-emoji"

emoji.New(emoji.WithEmojis(< custom emoji map >))

Unicode Emojis

Unicode has a wide range of emojis that have been in the standard since Unicode v6.0. Unicode emojis are not as simple as other unicode characters, as they can be composed of multiple unicode code points. Some emojis first proposed in Technical Standard (#51-R2) and introduced in Emoji 2.0 (Unicode v8.0) are made up of multiple emojis joined together using the Zero Width Joiner (ZWJ) character. This allows for a wide range of emojis to be created by combining existing emojis in different ways as well as providing a way to easily add variations to existing emojis.

Emojis in Go

In Go, strings are made up of runes, each rune representing an individual unicode code point. This means that for glyphs that are composed of multiple unicode code points (like joined emojis), some care needs to be taken when parsing and manipulating them.

The Go blog has a great post on strings that goes into more detail on how strings are handled and why runes are preferred over characters.

Example

1
2
// :technologist:
runes := []rune{0x1F9D1, 0x200D, 0x1F4BB}

The 0x200D rune is the Zero Width Joiner that combines the 🧑 :adult: and 💻 :laptop: emojis to create the 🧑‍💻 :technologist: emoji.

Parsing Hex Strings to Runes

The Unicode standard emoji list represents emojis as groups of hex strings which can be parsed to runes using the strconv package in Go. This is because the rune type is just an alias for the 32-bit integer type and is equivalent to it in all ways.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// parses a string of hex values separated by spaces into a slice of runes
func parseHexsToRunes(hexsRaw string) ([]rune, error) {
	hexs := strings.Fields(hexsRaw)
	runes := make([]rune, 0, len(hexs))

	for _, hex := range hexs {
		newRune, err := parseHexToRune(hex)
		if err != nil {
			return nil, err
		}
		runes = append(runes, newRune)
	}

	return runes, nil
}

// parses a string hex value into a rune
func parseHexToRune(hexRaw string) (rune, error) {
	hexInt, err := strconv.ParseUint(hexRaw, 16, 32)
	if err != nil {
		return 0, errors.New("failed to parse hex")
	}

	return rune(hexInt), nil
}

Outputting Runes as Strings

To output runes as a string, you can use the string type conversion to convert a slice of runes to a string.

1
2
3
4
runes := []rune{0x1F469, 0x200D, 0x1F4BB}
fmt.Println(string(runes))

// result: 👩‍💻

Results

For my own convenience, I’ve added a page that lists all the emojis, their Github shortcodes, and the custom shortcodes I’ve added for them (based on tehir unicode name). You can find it here if you’re curious.