hissp.munger module¶
Lissp’s symbol munger.
Encodes Lissp symbols with special characters into valid, human-readable (if ugly) Python identifiers, using NFKC normalization and x-codes.
E.g. *FOO-BAR*
becomes xSTAR_FOOxH_BARxSTAR_
.
X-codes are written in upper case and wrapped in an x
and _
.
This format was chosen because it contains an underscore
and both lower-case and upper-case letters,
which makes it distinct from
standard Python naming conventions:
lower_case_with_underscores
,
UPPER_CASE_WITH_UNDERSCORES
,
and CapWords
,
which makes the x-encoding (but not the normalization)
reversible in the usual cases,
and also cannot introduce a leading underscore,
which can have special meaning in Python.
Characters can be encoded in one of three ways: Short names, Unicode names, and ordinals.
The demunge
function will accept any of these encodings,
while the munge
function will prioritize short names,
then fall back to Unicode names, then fall back to ordinals.
Short names are given in the TO_NAME
table in this module.
Any spaces in the Unicode names are replaced with an x
and
any hyphens are replaced with an h
.
(Unicode names are in all caps and these substitutions are lower-case.)
Ordinals are given in base 10.
-
hissp.munger.
munge
(s: str) → str¶ Lissp’s symbol munger.
Encodes Lissp symbols with special characters into valid, human-readable (if ugly) Python identifiers, using NFKC normalization and x-codes.
Inputs that begin with
:
are assumed to be control words and returned unmodified. Full stops are handled separately, as those are meaningful to Hissp.
-
hissp.munger.
TO_NAME
= {'!': 'xBANG_', '"': 'x2QUOTE_', '#': 'xHASH_', '$': 'xDOLR_', '%': 'xPCENT_', '&': 'xET_', "'": 'x1QUOTE_', '(': 'xPAREN_', ')': 'xTHESES_', '*': 'xSTAR_', '+': 'xPLUS_', '-': 'xH_', '/': 'xSLASH_', ';': 'xSCOLON_', '<': 'xLT_', '=': 'xEQ_', '>': 'xGT_', '?': 'xQUERY_', '@': 'xAT_', '[': 'xSQUARE_', '\\': 'xBSLASH_', ']': 'xBRACKETS_', '^': 'xCARET_', '`': 'xGRAVE_', '{': 'xCURLY_', '|': 'xBAR_', '}': 'xBRACES_'}¶ Shorter names for X-encoding.
-
hissp.munger.
x_encode
(c: str) → str¶ Converts a character to its short x-encoding, unless it’s already valid in a Python identifier.
-
hissp.munger.
force_x_encode
(c: str) → str¶ Converts a character to its x-encoding, even if it’s valid in a Python identifier.
-
hissp.munger.
LOOKUP_NAME
= {'x1QUOTE_': "'", 'x2QUOTE_': '"', 'xAT_': '@', 'xBANG_': '!', 'xBAR_': '|', 'xBRACES_': '}', 'xBRACKETS_': ']', 'xBSLASH_': '\\', 'xCARET_': '^', 'xCURLY_': '{', 'xDOLR_': '$', 'xEQ_': '=', 'xET_': '&', 'xGRAVE_': '`', 'xGT_': '>', 'xHASH_': '#', 'xH_': '-', 'xLT_': '<', 'xPAREN_': '(', 'xPCENT_': '%', 'xPLUS_': '+', 'xQUERY_': '?', 'xSCOLON_': ';', 'xSLASH_': '/', 'xSQUARE_': '[', 'xSTAR_': '*', 'xTHESES_': ')'}¶ The inverse of
TO_NAME
.