mcfonts.utils.unicode
¶
Functions for working with Unicode characters, codepoints, and surrogate pairs.
Module Contents¶
Functions¶
|
Convert one character into 2 integers of its UTF-16 surrogate codepoints. |
|
Return if character would be invisible (might not have glyph info). |
|
Return if character is in a Private Use Area (PUA). |
|
Return if a codepoint is part of a high or low surrogate pair. |
|
Return relevant info about a character into a string, following |
|
Return a version of string with all alphanumeric characters changed into Tags. |
|
Given a tuple of surrogate chars, return the single codepoint they combine to. |
Attributes¶
A set of characters that do not have a visual representation under most fonts. |
- mcfonts.utils.unicode.INVISIBLE_CHARACTERS¶
A set of characters that do not have a visual representation under most fonts.
- mcfonts.utils.unicode.character_to_surrogates(character)¶
Convert one character into 2 integers of its UTF-16 surrogate codepoints.
A surrogate pair are two characters that represent another character. Since UTF-16 only stores characters from 0 to 0xFFFF, chars past 0xFFFF need to be split into two codepoints below 0xFFFF.
This is useful even in plaintext Unicode notation, because
\u1D105
is not a single character, it's two (ᴐ5
, not𝄅
).
- mcfonts.utils.unicode.is_character_invisible(character)¶
Return if character would be invisible (might not have glyph info).
A character is "invisible" if it:
Is in these categories:
Cf, Cc, Zl, Zs, Zp
.Is equal to these codepoints:
2800, 034F, 115F, 1160, 17B4, 17B5, 3164, FFA0, 1D159, 1D174, 1D176, 1D177, 1D178, 1D17A
.Is private use.
You can visit https://invisible-characters.com/ if you would like to see the list.
Warning
"Invisibility" is not a valid Unicode standard property. For standardization purposes, do use utilize this outside of this library.
- mcfonts.utils.unicode.is_character_private_use(character)¶
Return if character is in a Private Use Area (PUA).
A PUA is one of these codepoint ranges:
U+E000 to U+F8FF
U+F0000 to U+FFFFD
U+100000 to U+10FFFD
- mcfonts.utils.unicode.is_codepoint_surrogate(codepoint)¶
Return if a codepoint is part of a high or low surrogate pair.
- mcfonts.utils.unicode.pretty_print_character(character)¶
Return relevant info about a character into a string, following
U+<codepoint> <name> <character>
.>>> pretty_print_character('\u2601') 'U+2601: CLOUD ☁' >>> pretty_print_character('\ue000') 'U+E000: <PRIVATE USE> \ue000' >>> pretty_print_character('\U0001f400') 'U+1F400: RAT 🐀' >>> pretty_print_character('\b') 'U+0008: BACKSPACE ␈' >>> pretty_print_character('\b') 'U+0008: BACKSPACE ␈'
- mcfonts.utils.unicode.str_to_tags(string)¶
Return a version of string with all alphanumeric characters changed into Tags.
Given string, which should have only ASCII characters, turn it into that same string but every character is a Tag of itself, instead.
- mcfonts.utils.unicode.surrogates_to_character(surrogates)¶
Given a tuple of surrogate chars, return the single codepoint they combine to.