mcfonts.unicode¶

Functions for working with Unicode characters, codepoints, and surrogate pairs.

Module Contents¶

blocks(characters: collections.abc.Iterable[str], /) → dict[str, int]¶

Return a report of the Unicode blocks covered by characters.

Automatically sorted by block.

Parameters:¶

characters : collections.abc.Iterable[str]¶: Set of characters.

Returns:¶

Dictionary of {block name: count}.

Return type:¶

dict[str, int]

character_to_surrogates(character: str, /) → tuple[int, int]¶

Convert one character into 2 integers of its UTF-16 surrogate codepoints.

A surrogate pair are two characters that represent another character. Since UTF-16 only stores characters from 0 to 0xFFFF, chars past 0xFFFF need to be split into two codepoints below 0xFFFF.

This is useful even in plaintext Unicode notation, because \u1D105 is not a single character, it's two (ᴐ5, not 𝄅).

Parameters:¶

character : str¶: The character.

Returns:¶

A surrogate pair, in codepoints of the surrogates.

Return type:¶

tuple[int, int]

is_character_invisible(character: str, /) → bool¶

Return if character would be invisible.

A character is "invisible" if it:

Is in these categories: Cf, Cc, Zl, Zs, Zp.
Is equal to these codepoints: 2800, 034F, 115F, 1160, 17B4, 17B5, 3164, FFA0, 1D159, 1D174, 1D176, 1D177, 1D178, 1D17A.
Is private use.

You can visit https://invisible-characters.com/ if you would like to see the list.

Warning

"Invisibility" is not a valid Unicode standard property. For standardization purposes, do use utilize this outside of this library.

Returns:¶

If character wuold appear invisible.

Parameters:¶

character : str¶: The character.

Return type:¶

bool

is_character_private_use(character: str, /) → bool¶

Return if character is in a Private Use Area (PUA).

A PUA is one of these codepoint ranges:

U+E000 to U+F8FF
U+F0000 to U+FFFFD
U+100000 to U+10FFFD

Parameters:¶

character : str¶: The character.

Returns:¶

If character is in a Private Use Area.

Return type:¶

bool

is_character_surrogate(character: str, /) → bool¶

Return if a character is part of a high or low surrogate pair.

Parameters:¶

character : str¶: The character.

Returns:¶

If it's within U+D800 to U+DC00.

Return type:¶

bool

pprint_char(character: str, /) → str¶

Return relevant info about a character into a string, following U+<codepoint> <name> <character>.

>>> pprint_char('\u2601')
'U+2601: CLOUD ☁'
>>> pprint_char('\ue000')
'U+E000: <PRIVATE USE> \ue000'
>>> pprint_char('\U0001f400')
'U+1F400: RAT 🐀'
>>> pprint_char('\b')
'U+0008: BACKSPACE ␈'
>>> pprint_char('\b')
'U+0008: BACKSPACE ␈'

Parameters:¶

character : str¶: The character.

Returns:¶

A "pretty" character string.

Return type:¶

str

str_to_tags(string: str, /) → str¶

Return a version of string with all alphanumeric characters changed into Tags.

Given string, which should have only ASCII characters, turn it into that same string but every character is a Tag of itself, instead.

See https://en.wikipedia.org/wiki/Tags_(Unicode_block).

Parameters:¶

string : str¶: Any string; it should have onlu ASCII characters.

Returns:¶

Input string but with ASCII characters replaced with their Tag equivalents.

Return type:¶

str

surrogates_to_character(surrogates: tuple[int, int], /) → str¶

Given a tuple of surrogate codepoints, return the single character they represent.

Parameters:¶

surrogates : tuple[int, int]¶: The tuple of 2 surrogate codepoints.

Returns:¶

A single character of the resulting surrogates.

Return type:¶

str

symbol_for(character: str, /) → str¶

Parameters:¶

character : str¶

Return type:¶

str

INVISIBLE_CHARACTERS : Final[set[str]]¶: A set of characters that do not have a visual representation under most fonts.

SURROGATE_END : Final[int] = 56320¶

SURROGATE_START : Final[int] = 55296¶