hissp.munger module#
Lisspās symbol token
munger.
Encodes Lissp symbol tokens with special characters into valid,
human-readable (if unpythonic) Python identifiers,
using NFKC normalization and Quotez
.
E.g. *FOO-BAR*
becomes QzSTAR_FOOQzH_BARQzSTAR_
.
Quotez are written in upper case and wrapped in a Qz
and _
.
This format was chosen because it contains an underscore
and both upper-case and lower-case letters,
which makes it distinct from
standard Python naming conventions:
lower_case_with_underscores
,
UPPER_CASE_WITH_UNDERSCORES
,
and CapWords
,
as well as an extremely rare bigram, āQzā,
which makes the Quotez (but not the normalization)
reversible in the usual cases,
and also cannot introduce a leading underscore,
which can have special meaning in Python.
Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.
The demunge()
function will accept any of these encodings,
while the munge()
function will prioritize short names,
then fall back to Unicode names, then fall back to ordinals.
Short names are given in the TO_NAME
table in this module.
Any spaces in the Unicode names are replaced with an x
and
any hyphens are replaced with an h
.
(Unicode names are in all caps and these substitutions are lower-case.)
Ordinals are given in a hexadecimal format like 0XF00
.
- hissp.munger.munge(s: str) str [source]#
Lisspās symbol munger.
Encodes Lissp symbols with special characters into valid, human-readable (if unpythonic) Python identifiers, using NFKC normalization and
Quotez
.Full stops are handled separately, as those are meaningful to Hissp.
- hissp.munger.FIND_QUOTEZ = re.compile('Qz([0-9A-Z][0-9A-Zhx]*?)_')#
- hissp.munger.TO_NAME = {'!': 'QzBANG_', '"': 'QzQUOT_', '#': 'QzHASH_', '$': 'QzDOLR_', '%': 'QzPCENT_', '&': 'QzET_', "'": 'QzAPOS_', '(': 'QzLPAR_', ')': 'QzRPAR_', '*': 'QzSTAR_', '+': 'QzPLUS_', '-': 'QzH_', '.': 'QzDOT_', '/': 'QzSOL_', ';': 'QzSEMI_', '<': 'QzLT_', '=': 'QzEQ_', '>': 'QzGT_', '?': 'QzQUERY_', '@': 'QzAT_', '[': 'QzLSQB_', '\\': 'QzBSOL_', ']': 'QzRSQB_', '^': 'QzHAT_', '`': 'QzGRAVE_', '{': 'QzLCUB_', '|': 'QzVERT_', '}': 'QzRCUB_'}#
Shorter names for
Quotez
.
- hissp.munger.qz_encode(c: str) str [source]#
Converts a character to its
Quotez
encoding, unless itās already valid in a Python identifier.
- hissp.munger.force_qz_encode(c: str) str [source]#
Converts a character to its
Quotez
encoding, even if itās valid in a Python identifier.
- hissp.munger.LOOKUP_NAME = {'QzAPOS_': "'", 'QzAT_': '@', 'QzBANG_': '!', 'QzBSOL_': '\\', 'QzDOLR_': '$', 'QzDOT_': '.', 'QzEQ_': '=', 'QzET_': '&', 'QzGRAVE_': '`', 'QzGT_': '>', 'QzHASH_': '#', 'QzHAT_': '^', 'QzH_': '-', 'QzLCUB_': '{', 'QzLPAR_': '(', 'QzLSQB_': '[', 'QzLT_': '<', 'QzPCENT_': '%', 'QzPLUS_': '+', 'QzQUERY_': '?', 'QzQUOT_': '"', 'QzRCUB_': '}', 'QzRPAR_': ')', 'QzRSQB_': ']', 'QzSEMI_': ';', 'QzSOL_': '/', 'QzSTAR_': '*', 'QzVERT_': '|'}#
The inverse of
TO_NAME
.
- hissp.munger.demunge(s: str) str [source]#
The inverse of
munge()
. Decodes anyQuotez
into characters.Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.
demunge
will decode any of these. Even thoughmunge()
will consistently pick only one of these for any given character, which Unicode characters have names depends on the Python version.demunge
will also leave the remaining text as-is, along with any invalid Quotez.>>> demunge("QzFOO_QzGT_QzHYPHENhMINUS_Qz0X3E_bar") 'QzFOO_>->bar'