previous next top complete contents complete index framed top this page unframed
MAINSAIL guarantees that a unique character corresponds to each of the following characters:
Table 2–1. MAINSAIL Minimum Character Set
| ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 ! " # $ & ' ( ) * + , - . / : ; < = > ? [ ] ^ _ @ \ | { } % ` ~ space (blank) tab (horizontal tab) eol (end-of-line) $cr (carriage return) $lf (linefeed) eop (end-of-page) $nulChar (null character) |
MAINSAIL cannot guarantee the graphics associated with each character, but they are chosen to approximate those shown.
MAINSAIL does not have a separate data type for characters; characters are represented as INTEGERs. Associated with each character is an INTEGER character code. Character codes range from 0 to the predefined constant $maxChar, which has the value 255, since MAINSAIL characters occupy eight bits each (MAINSAIL currently provides no support for multibyte characters, such as those specified by Unicode). In this manual the term character is often used to mean “character code”. The following may be assumed about the ordering of character codes:
The three identifiers tab, eol, and eop are predefined by MAINSAIL as STRING constant macros.
The exact effect of a tab is peripheral-dependent, but it usually positions to the next horizontal tab stop. MAINSAIL does not define tab stops since the peripherals on which tabs have an effect may not be under MAINSAIL's control. Tab stops are often defined to be every fourth or eighth column. The tab character has the code 9 on ASCII systems.
eol (“end-of-line”) is a one-character STRING that indicates the end of a line of text. When written to a file it terminates a line, so that the next character is written at the start of the next line. The eol character typically has the code 10 on ASCII systems (it is possible that other values may be encountered).
When reading a line from a text file with the system procedure read, MAINSAIL searches for an end-of-line by searching for eol. The eol character is discarded. All characters up to the end-of-line sequence make up the line as produced by read.
eop (“end-of-page”) is a single-character STRING that indicates the end of a page of text. When printed to a file it terminates a page (the next character is written at the top of the next page). The eop character has the code 12 on ASCII systems.
Each implementation specifies a null character that is by default ignored (discarded) when encountered in a text input file. The null character code is given by $nulChar. The null character code for an ASCII character set is the code 0. See Section 43.5 for further information about the treatment of null characters in a text input file.
The MAINSAIL compiler translates character codes (e.g., in STRING constants) from the host character set (the one used by the compiler) to the target character set (the one used by the compiled program). Unique characters on the host machine are translated to unique characters on the target machine, provided that the host characters are among those shown in Table 2–1. Other characters are used at the programmer's risk; if a character cannot be translated to the target character set, a compiletime error occurs. Characters in comments are not translated to the target character set; they affect only the portability of the source text itself.
Table 2–2 shows the system procedures provided to complement the minimal assumptions guaranteed above. The argument to each is an INTEGER character code.
-1 is used in several situations to represent “no character”, since no character code can be -1 (codes are guaranteed nonnegative). For example, first(s) returns the character code of the first character of the STRING s; if s is empty (i.e., contains no characters), then first(s) is -1.
Table 2–2. Character-Set-Independent System Procedures
| isLowerCase(i) | TRUE if i is the code for one of a...z. |
| isUpperCase(i) | TRUE if i is the code for one of A...Z. |
| isAlpha(i) | TRUE if i is the code for one of a...zA...Z. |
| $printingChar(i) | TRUE if i is alphanumeric, punctuation, or space, but not tab, eol, eop or control character. |
| isNul(i) | isNul(i) is TRUE if i is the code for the null character, $nulChar. |
| $treatLikeNul(i) | $treatLikeNul(i) is TRUE if i, like $nulChar, is discarded on input from files where $keepNul is not set. |
| prevAlpha(i) | Code of the alphabetically previous character (same case) before the one with code i. Undefined if i is not the code for one of b...zB...Z. |
| nextAlpha(i) | Code of the alphabetically next character (same case) after the one with code i. Undefined if i is not the code for one of a...yA...Y. |
| cvl(i) | If i is the code for one of A...Z, then the result is the code for the corresponding lowercase letter; otherwise it is i itself. |
| cvu(i) | If i is the code for one of a...z, then the result is the code for the corresponding uppercase letter; otherwise it is i itself. |
a[1] := 0; # clear first element
A large body of text may be “commented out” in three ways:
In comparing identifiers, the compiler does not distinguish between upper- and lowercase letters; e.g., it considers the identifiers typecode, typeCode, and TYPECODE to be identical. There is no “break” character for identifiers; programmers may use a mixture of lower and upper case to show the structure of identifiers. For example, sizeOfArray is more understandable than sizeofarray or SIZEOFARRAY.
Certain identifiers (keywords such as BEGIN, END, and ARRAY) are “reserved”, i.e., cannot be declared or defined by the programmer. A list of the reserved identifiers is given in Appendix G.
Certain other identifiers (e.g., tab, create, delete) are predefined by MAINSAIL and cannot be declared by the programmer.
$ is the initial character of certain predefined and predeclared identifiers used by the MAINSAIL runtime system. This avoids conflicts with the programmer's identifiers, which must not begin with $. When XIDAK creates a new predefined identifier, it begins with $. A user declaration of an identifier beginning with $ has undefined consequences.
Examples of legal identifiers are shown in Example 2–3.
Example 2–3. Legal Identifiers
| i,j,k,l,m,n | common integer identifiers |
| make4sets | |
| aVeryLongIdentifier | same as AVERYLONGIDENTIFIER |
Examples of illegal identifiers are shown in Example 2–4. Note that _ is not legal in identifiers because, for historical reasons, it is a synonym of the assignment operator.
Example 2–4. Illegal Identifiers
| array | reserved identifier |
| 5th | starts with a number |
| cost-in-$ | $ and - cannot be used |
| weight_in_lbs | _ cannot be used because it is a synonym for the assignment operator |
Usually any number of whitespace characters
(e.g., spaces, tabs, and ends-of-line)
may separate
syntactic units.
When in doubt, consult the description of the language construct
in question.
2.5. Compiletime Evaluation
If a BOOLEAN, (LONG) INTEGER,
(LONG) BITS, or STRING
operation (unary or binary operator)
has constant operands, the compiler evaluates it
at compiletime.
A call to a system procedure
declared with qualifier COMPILETIME
is evaluated at compiletime
if all the arguments are constants. The term constant
expression refers to an expression that can be evaluated at
compiletime.
All compiletime (LONG) INTEGER arithmetic is carried out on STRING representations (with a very large number of digits) so that the capabilities of the computer on which the compiler is running do not affect the results. (LONG) INTEGER arithmetic operations that overflow may therefore not have the same result at compiletime as they would have had if the operations had been performed on the target at runtime.
(LONG) BITS operations are performed on STRING representations that have the same number of digits as (LONG) BITS on the target machine, so that operations that discard bits to the left have the same results at compiletime as at runtime.
STRING operations evaluated at compiletime produce the same result that would have been produced if they had been evaluated at runtime; i.e., they act on STRINGs as translated to the target character set, rather than as on the host machine.
If an expression contains constant subexpressions, the subexpressions
should be enclosed in parentheses to ensure that they are evaluated
at compiletime.
For example, i + 2 + 4, where i is not
evaluated at compiletime, should be written as i + (2 + 4) to
ensure that the addition of 2 and 4
is done at compiletime. It might
otherwise be treated as (i + 2) + 4, which involves two additions
during execution.
A storage unit is the basic measure for the
amount of memory required by
the various data types. For example, a storage unit may represent a
“byte” or “word”, although these terms are not usual in XIDAK
documentation.
Every storage unit contains a processor-dependent number of bits,
given by the predefined constant $bitsPerStorageUnit.
(Currently, a storage unit is one byte on all existing MAINSAIL
implementations, but this has not been true of some historical
implementations, and it is at least theoretically possible that a
future implementation may have a storage unit size that is not 8 bits.)
Data files and memory in which values of the MAINSAIL data types
are stored are viewed as linear sequences of storage units.
Storage units are employed in situations requiring a measure of memory
or file size without regard to data type, e.g., as an argument to the
system procedure setPos for a data file.
The system procedure size or $lSize
can be used to determine the
number of storage units occupied by a particular data type
or record;
$sizeOfValue is similar, but takes a more general
argument.
The procedure $ioSize can be used to determine how many storage
or character units a data type occupies in a given data file,
which may be different from the size of that data type in memory.
DSP and $LDSP return
the offset in storage units from the start of a record
to a field in the record.
Text files and memory in which characters are stored may be viewed
as linear sequences of character units.
Character units are employed when a file or memory position must be
specified as the number of characters it contains, e.g., as an argument
to the system procedure setPos for a text file.
A character unit is always eight bits.
Character units may not necessarily coincide with storage units,
but storage units are always an integral multiple of character units.
The number of bits per character unit (eight) is defined as
$bitsPerChar,
so the number of character units per storage unit is given by
$bitsPerStorageUnit DIV $bitsPerChar.
2.6. Storage Units and Character Units
2.7. Type Codes
Each data type is assigned an INTEGER type code that is used in
various ways in MAINSAIL, e.g., as an argument to the compiletime system
procedure size.
Predefined INTEGER constants for the type codes for basic MAINSAIL types are shown in Table 2–5. For example, size(integerCode) is the number of storage units an INTEGER occupies in memory.
In addition, there exist extended type codes, describing sized data types (see Section 3.12). The complete list of extended type codes appears in Appendix A.
| booleanCode |
| integerCode |
| longIntegerCode |
| realCode |
| longRealCode |
| bitsCode |
| longBitsCode |
| stringCode |
| addressCode |
| charadrCode |
| pointerCode |
| $recordCode |
| $procvarCode |
In addition to garbage collection, automatic memory management can move dynamic objects and STRING text around in memory as well as deallocate them. This is invisible to a correctly written program, since the referencing variables are automatically updated to point to the moved data.
In this manual, the term “garbage collection” is often used loosely in this manual to denote memory management operations in general. Strictly speaking, only the freeing of inaccessible dynamic objects or STRING text constitutes garbage collection; other kinds of operations, such as moving dynamic objects around, is not really garbage collection. Nonetheless, this manual uses phrases like “during a garbage collection” to refer to the time when MAINSAIL is doing automatic memory management of any kind, or “cannot trigger a garbage collection” to say that a particular construct cannot trigger automatic memory management.
Variables of the types ADDRESS and CHARADR are not updated during a garbage collection, even if they point to structures that may be moved. The user ordinarily uses variables of these data types to point into scratch space (or static space), i.e., areas of memory in which data are not collected. Scratch space may be obtained by calling the system procedure newPage or the system procedure newScratch.
Constructs that may trigger garbage collections are noted in this manual.
Since garbage collections may take a great deal of time, the programmer may wish to prevent collections if he or she knows that few inaccessible data are being generated. This is best accomplished by calling the system procedure $memoryManagementInfo (see Section 41.9); it can also be accomplished by incrementing the system variable $collectLock (although doing this may cause MAINSAIL to run out of memory if inaccessible data are in fact being generated). The system procedure $collect causes a collection to be performed, even when $collectLock is nonzero. Collections can also be prevented (or reduced in scope or frequency) indirectly by calling the system procedure dispose to deallocate data structures explicitly whenever possible, or by keeping data in areas (see Chapter 25) and disposing of all the data in an area at once with $disposeArea.
Frequency of garbage collection can be controlled by using the utility CONF to set various parameters in a MAINSAIL bootstrap; see Chapter 6 of the MAINSAIL Utilities User's Guide for details.
2.8.1. How the MAINSAIL Garbage Collector Determines Whether
Data Are Accessible
The MAINSAIL garbage collector operates by reclaiming data that are
not accessible.
This section presents a detailed
description of how the garbage collector determines which data
are accessible.
A dynamic object is accessible if any of the following are true (for the purposes of the following discussion, an active procedure is any procedure for which a stack frame exists in any currently existing coroutine):
STRING text is accessible if a local variable
in an active procedure
points to it or if a STRING variable in an accessible
dynamic object points to it.
2.9. cmdFile and logFile
cmdFile and logFile are files associated by default with
a MAINSAIL execution's primary input and primary output (usually
terminal input and terminal output), respectively.
They are described in Section 22.12.
MAINSAIL Language Manual, Chapter 2