hissp.munger module#

Lissp’s symbol token munger.

Encodes Lissp symbol tokens with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and Quotez.

E.g. *FOO-BAR* becomes QzSTAR_FOOQzH_BARQzSTAR_.

Quotez are written in upper case and wrapped in a Qz and _. This format was chosen because it contains an underscore and both upper-case and lower-case letters, which makes it distinct from standard Python naming conventions: lower_case_with_underscores, UPPER_CASE_WITH_UNDERSCORES, and CapWords, as well as an extremely rare bigram, “Qz”, which makes the Quotez (but not the normalization) reversible in the usual cases, and also cannot introduce a leading underscore, which can have special meaning in Python.

Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.

The demunge() function will accept any of these encodings, while the munge() function will prioritize short names, then fall back to Unicode names, then fall back to ordinals.

Short names are given in the TO_NAME table in this module.

Any spaces in the Unicode names are replaced with an x and any hyphens are replaced with an h. (Unicode names are in all caps and these substitutions are lower-case.)

Ordinals are given in a hexadecimal format like 0XF00.

hissp.munger.munge(s: str) → str[source]#

Lissp’s symbol munger.

Encodes Lissp symbols with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and Quotez.

Full stops are handled separately, as those are meaningful to Hissp.

hissp.munger.FIND_MUNGED = re.compile('(Ox[A-F0-9]+|[A-Z][a-z0-9]+|[A-Z][XHa-z0-9]+X)_')#: Regex pattern to find munged characters. Used by demunge.

hissp.munger.TO_NAME = {'!': 'Bang_', '"': 'Quot_', '#': 'Hash_', '$': 'Dolr_', '%': 'Pcent_', '&': 'Et_', "'": 'Apos_', '(': 'Lpar_', ')': 'Rpar_', '*': 'Star_', '+': 'Plus_', '-': 'Dash_', '.': 'Stop_', '/': 'Fsol_', ';': 'Scoln_', '<': 'Lt_', '=': 'Eq_', '>': 'Gt_', '?': 'Eh_', '@': 'At_', '[': 'Lsqb_', '\\': 'Bsol_', ']': 'Rsqb_', '^': 'Hat_', '`': 'Grave_', '{': 'Lcub_', '|': 'Vert_', '}': 'Rcub_'}#: Shorter names for Quotez.

hissp.munger.encode(c: str) → str[source]#: Converts a character to its munged encoding, unless it’s already valid in a Python identifier.

hissp.munger.force_encode(c: str) → str[source]#: Converts a character to its munged encoding, even if it’s valid in a Python identifier.

hissp.munger.LOOKUP_NAME = {'Apos_': "'", 'At_': '@', 'Bang_': '!', 'Bsol_': '\\', 'Dash_': '-', 'Dolr_': '$', 'Eh_': '?', 'Eq_': '=', 'Et_': '&', 'Fsol_': '/', 'Grave_': '`', 'Gt_': '>', 'Hash_': '#', 'Hat_': '^', 'Lcub_': '{', 'Lpar_': '(', 'Lsqb_': '[', 'Lt_': '<', 'Pcent_': '%', 'Plus_': '+', 'Quot_': '"', 'Rcub_': '}', 'Rpar_': ')', 'Rsqb_': ']', 'Scoln_': ';', 'Star_': '*', 'Stop_': '.', 'Vert_': '|'}#: The inverse of TO_NAME.

hissp.munger.demunge(s: str) → str[source]#

The inverse of munge(). Decodes any Quotez into characters.

Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals. demunge will decode any of these. Even though munge() will consistently pick only one of these for any given character, which Unicode characters have names depends on the Python version.

demunge will also leave the remaining text as-is, along with any invalid Quotez.

>>> demunge("Foo_Gt_HyphenHminusX_Ox3E_bar")
'Foo_>->bar'