mcfonts.unicode¶
Functions for working with Unicode characters, codepoints, and surrogate pairs.
Module Contents¶
- blocks(characters: collections.abc.Iterable[str], /) dict[str, int]¶
Return a report of the Unicode blocks covered by characters.
Automatically sorted by block.
- character_to_surrogates(character: str, /) tuple[int, int]¶
Convert one character into 2 integers of its UTF-16 surrogate codepoints.
A surrogate pair are two characters that represent another character. Since UTF-16 only stores characters from 0 to 0xFFFF, chars past 0xFFFF need to be split into two codepoints below 0xFFFF.
This is useful even in plaintext Unicode notation, because
\u1D105is not a single character, it's two (ᴐ5, not𝄅).
- is_character_invisible(character: str, /) bool¶
Return if character would be invisible.
A character is "invisible" if it:
Is in these categories:
Cf, Cc, Zl, Zs, Zp.Is equal to these codepoints:
2800, 034F, 115F, 1160, 17B4, 17B5, 3164, FFA0, 1D159, 1D174, 1D176, 1D177, 1D178, 1D17A.Is private use.
You can visit https://invisible-characters.com/ if you would like to see the list.
Warning
"Invisibility" is not a valid Unicode standard property. For standardization purposes, do use utilize this outside of this library.
- is_character_private_use(character: str, /) bool¶
Return if character is in a Private Use Area (PUA).
A PUA is one of these codepoint ranges:
U+E000 to U+F8FF
U+F0000 to U+FFFFD
U+100000 to U+10FFFD
- is_character_surrogate(character: str, /) bool¶
Return if a character is part of a high or low surrogate pair.
- pprint_char(character: str, /) str¶
Return relevant info about a character into a string, following
U+<codepoint> <name> <character>.>>> pprint_char('\u2601') 'U+2601: CLOUD ☁' >>> pprint_char('\ue000') 'U+E000: <PRIVATE USE> \ue000' >>> pprint_char('\U0001f400') 'U+1F400: RAT 🐀' >>> pprint_char('\b') 'U+0008: BACKSPACE ␈' >>> pprint_char('\b') 'U+0008: BACKSPACE ␈'
- str_to_tags(string: str, /) str¶
Return a version of string with all alphanumeric characters changed into Tags.
Given string, which should have only ASCII characters, turn it into that same string but every character is a Tag of itself, instead.
- surrogates_to_character(surrogates: tuple[int, int], /) str¶
Given a tuple of surrogate codepoints, return the single character they represent.
- INVISIBLE_CHARACTERS : Final[set[str]]¶
A set of characters that do not have a visual representation under most fonts.
-
SURROGATE_END : Final[int] =
56320¶
-
SURROGATE_START : Final[int] =
55296¶