hissp.munger module#

Lissp’s symbol munger.

Encodes Lissp symbols with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and Quotez.

E.g. *FOO-BAR* becomes QzSTAR_FOOQz_BARQzSTAR_.

Quotez are written in upper case and wrapped in a Qz and _. This format was chosen because it contains an underscore and both upper-case and lower-case letters, which makes it distinct from standard Python naming conventions: lower_case_with_underscores, UPPER_CASE_WITH_UNDERSCORES, and CapWords, as well as an extremely rare bigram, “Qz”, which makes the Quotez (but not the normalization) reversible in the usual cases, and also cannot introduce a leading underscore, which can have special meaning in Python.

Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.

The demunge function will accept any of these encodings, while the munge function will prioritize short names, then fall back to Unicode names, then fall back to ordinals.

Short names are given in the TO_NAME table in this module.

Any spaces in the Unicode names are replaced with an x and any hyphens are replaced with an h. (Unicode names are in all caps and these substitutions are lower-case.)

Ordinals are given in base 10.

hissp.munger.munge(s: str) str[source]#

Lissp’s symbol munger.

Encodes Lissp symbols with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and Quotez.

Inputs that begin with : are assumed to be control words and returned unmodified. Full stops are handled separately, as those are meaningful to Hissp.

hissp.munger.force_munge(s: str) str[source]#

As munge, but skips the control word check.

Used for reader tags.

hissp.munger.QUOTEZ = 'Qz{}_'#

Format string for creating Quotez.

hissp.munger.FIND_QUOTEZ = re.compile('Qz([0-9A-Z][0-9A-Zhx]*?)?_')#

Regex pattern to find Quotez. Used by demunge.

hissp.munger.TO_NAME = {'!': 'QzBANG_', '"': 'QzQUOT_', '#': 'QzHASH_', '$': 'QzDOLR_', '%': 'QzPCENT_', '&': 'QzET_', "'": 'QzAPOS_', '(': 'QzLPAR_', ')': 'QzRPAR_', '*': 'QzSTAR_', '+': 'QzPLUS_', '-': 'Qz_', '/': 'QzSOL_', ';': 'QzSEMI_', '<': 'QzLT_', '=': 'QzEQ_', '>': 'QzGT_', '?': 'QzQUERY_', '@': 'QzAT_', '[': 'QzLSQB_', '\\': 'QzBSOL_', ']': 'QzRSQB_', '^': 'QzHAT_', '`': 'QzGRAVE_', '{': 'QzLCUB_', '|': 'QzVERT_', '}': 'QzRCUB_'}#

Shorter names for Quotez.

hissp.munger.qz_encode(c: str) str[source]#

Converts a character to its Quotez encoding, unless it’s already valid in a Python identifier.

hissp.munger.force_qz_encode(c: str) str[source]#

Converts a character to its Quotez encoding, even if it’s valid in a Python identifier.

hissp.munger.LOOKUP_NAME = {'QzAPOS_': "'", 'QzAT_': '@', 'QzBANG_': '!', 'QzBSOL_': '\\', 'QzDOLR_': '$', 'QzEQ_': '=', 'QzET_': '&', 'QzGRAVE_': '`', 'QzGT_': '>', 'QzHASH_': '#', 'QzHAT_': '^', 'QzLCUB_': '{', 'QzLPAR_': '(', 'QzLSQB_': '[', 'QzLT_': '<', 'QzPCENT_': '%', 'QzPLUS_': '+', 'QzQUERY_': '?', 'QzQUOT_': '"', 'QzRCUB_': '}', 'QzRPAR_': ')', 'QzRSQB_': ']', 'QzSEMI_': ';', 'QzSOL_': '/', 'QzSTAR_': '*', 'QzVERT_': '|', 'Qz_': '-'}#

The inverse of TO_NAME.

hissp.munger.demunge(s: str) str[source]#

The inverse of munge. Decodes any Quotez into characters.

Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals. demunge will decode any of these, even though munge will consistently pick only one of these for any given character. demunge will also leave the remaining text as-is, along with any invalid Quotez.

>>> demunge("QzFOO_QzGT_QzHYPHENhMINUS_Qz62_bar")
'QzFOO_>->bar'