ZGL String Literals

Context

ZGL is a data interchange language (also known as a data exchange language).

Although it lacks string processing functions (by design), ZGL aims to make it easy to include strings in various styles, ranging from short strings to large blocks of text.

Rationale

As you will see below, ZGL provides six options that can be mixed and matched using flags (e.g. -ecltaz). Learning 6 options is easier than 2 ^ 6 = 64.

YAML serves as an illustrative counterexample to ZGL's rationale.

Encoding

Strings in ZGL use UTF-8 encoding. Any unicode character is allowed.

Unicode characters can be included directly; e.g. "halló", "slán", or "奇怪环形".

Consistent with the above, strings may include newlines; for example:

"No person is an island,
Entire of itself,
Every being is a piece of the continent,
A part of the main."

Forms

String literals may take any of these forms:

Table 1: String Literal Forms
name example
0# "···"
1# #"···"#
2# ##"···"##
3# ###"···"###`

The forms with one or more # are referred to as # forms; they can help reduce the amount of escaping needed.  For example, in the table below, the string literals in each column are equivalent:

Table 2: Examples in different forms
form x "y" "#z"
0# "x" "\"y\"" "\"#z\""
1# #"x"# #""y""# #"\"#z""#
2# ##"x"## ##""y""## ##""#z""##

Also, forms may be prefixed with options, explained next.

Options

A string literal has six independent options:

Table 3: Options
option (enabled by default) flag to disable
escaping -e
continuations -c
unindent leading whitespace -l
discard trailing whitespace -t
discard empty first line -a
discard empty last line -z

By default, all options are enabled; e.g. "simpla" or #"απλός"#, has all options enabled.

Options can be disabled selectively by prefixing with flags. Option flags are case sensitive; only lowercase is allowed.

  • -a"..." disables "discard empty first line".
  • -z"..." disables "discard empty last line".

If more than one flag is used, only one - should be used. For example:

  • -az"..." disables 2 options.
  • -taz"..." disables 3 options.

The options are explained in detail below.

Escaping

Escaping is enabled by default. (Disable with the -e prefix.)

Here are the available types of escape codes:

Table 4: Escaping
pattern label escaped
examples
resulting
characters
\[\"'nrt0] ASCII
shortcut
\\
\"
\'
\n
\r
\t
\0
\
"
'
newline
carriage return
tab
null character
\xHH ASCII
character
\x5e
\x5f
^
_
\u{H{1,6}} Unicode
character
(1 to 6 digits)
\u{1ce}
\u{0x2d44}
\u{20d47}
ǎ

𠵇

Notes:

  • H means a hex digit; e.g. the [xX] regex pattern.
  • Backslashes must be escaped, unless you disable escaping (see below).
  • The single quote ' does not have to be escaped.

Continuations

Continuations are enabled by default. (Disable with the -c prefix.)

To split a string across multiple lines without a newline, end a line with \. This is called a string continuation.

Example 1: Continuation
=
"wrap\
around"
"wraparound"
Example 2: No continuation
=
"1\\
2"
"1\x5c
2"
"1\u{5c}
2"
-e"1\
2"
"1\\\n2"
Example 3: End with backslash and use a continuation
=
"A\\\
B"
-e"A\\
B"
"A\\B"
-e"A\B"

Make sure you understand this before going to the next section.

Continuations: disable

To disable continuations, prefix with -c.  This is case sensitive.

This flag is useful in combination with other flags, but using it alone is not helpful. To put it another way, -c"..." does not provide any advantages over simply using "...".

Example 4: Disable continuations
=
-c"hai \\
dòng"
"hai \\\ndòng
"hai \\
dòng"
Example 5: Invalid example
-c"dies ist \
ungültig"
This is invalid because:
  1. Continuations are disabled, so \ plus
    a newline has no special meaning.
  2. Escaping is on (by default), so
    backslashes have to be escaped.

Unindent leading whitespace

This is enabled by default. (Disable with the -l prefix.)

Example 6: Unindent leading whitespace
=
"Medeski
 Martin &
 Wood";
"Medeski\nMartin&\n Wood";

Each line of a string is unindented as follows:

  1. Count the leading spaces of each line, ignoring the first line and any lines that are empty or contain spaces only.
  2. Take the minimum.
  3. If the first line is empty i.e. the string begins with a newline, remove the first line.
  4. Remove the computed number of spaces from the beginning of each line.

This behavior is the same as used by indoc, a Rust crate: the above text is copied from it.

Example 7: Disable unindent leading whitespace
=
-l"Postman never owned
     a computer
       or typewriter"
"Postman never owned
a computer
or typewriter"

Discard trailing whitespace

This is enabled by default. (Disable with the -t prefix.)

This option discards trailing whitespace on each line.

Example 8: Discard trailing whitespace
(shown with a middle dot; e.g. ·)
=
"a theocracy may equate··
public morality with·····
religious instruction,···
and give both the········
equal force of law.·····"
"a theocracy may equate
public morality with
religious instruction,
and give both the
equal force of law."

Discard empty first line

This is enabled by default. (Disable with the -a prefix.)

Example 9: Disable empty first line
=
Voltaire = "
Those who can make
you believe absurdities
can make you
commit atrocities.";
Voltaire = "Those who can make
you believe absurdities
can make you
commit atrocities.";

Discard empty last line

This is enabled by default. (Disable with -z)

Example 10: Disable empty last line
=
"Being American is more
than a pride we inherit,
It’s the past we step into
and how we repair it
- Amanda Gorman
"
"Being American is more
than a pride we inherit,
It’s the past we step into
and how we repair it
- Amanda Gorman"

Translations

In case you are curious of some of the phrases used above ...

wordlanguageEnglish translation
hallóIcelandichello
slánIrishgoodbye
奇怪环形Chinesestrange loop
simplaEsperantosimple
απλόςGreeksimple
hai dòngVietnamesetwo lines
dies ist ungültigGermanthis is invalid
hem acabat aquíCatalanwe're done here
¿por qué sigues aquí?Spanishwhy are you still here?
zoo, Kuv tsis tuHmongfine, I don't care
Last update:
2021-03-26