MAINSAIL Language Manual, Chapter 2

previous   next   top   complete contents   complete index   framed top   this page unframed


2. Basic Language Concepts

2.1. Character Set

All current MAINSAIL implementations run on systems that use the ASCII character set, and this is likely to be true of future implementations as well, but not absolutely certain. MAINSAIL does not require an ASCII character set; instead, MAINSAIL makes some guarantees about the character set and provides facilities to make it easy to write programs that are independent of the character set.

MAINSAIL guarantees that a unique character corresponds to each of the following characters:

Table 2–1. MAINSAIL Minimum Character Set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789
! " # $ & ' ( ) * + , - . / : ;
< = > ? [ ] ^ _ @ \ | { } % ` ~

space (blank)
tab (horizontal tab)
eol (end-of-line)
$cr (carriage return)
$lf (linefeed)
eop (end-of-page)
$nulChar (null character)

MAINSAIL cannot guarantee the graphics associated with each character, but they are chosen to approximate those shown.

MAINSAIL does not have a separate data type for characters; characters are represented as INTEGERs. Associated with each character is an INTEGER character code. Character codes range from 0 to the predefined constant $maxChar, which has the value 255, since MAINSAIL characters occupy eight bits each (MAINSAIL currently provides no support for multibyte characters, such as those specified by Unicode). In this manual the term character is often used to mean “character code”. The following may be assumed about the ordering of character codes:

The three identifiers tab, eol, and eop are predefined by MAINSAIL as STRING constant macros.

The exact effect of a tab is peripheral-dependent, but it usually positions to the next horizontal tab stop. MAINSAIL does not define tab stops since the peripherals on which tabs have an effect may not be under MAINSAIL's control. Tab stops are often defined to be every fourth or eighth column. The tab character has the code 9 on ASCII systems.

eol (“end-of-line”) is a one-character STRING that indicates the end of a line of text. When written to a file it terminates a line, so that the next character is written at the start of the next line. The eol character typically has the code 10 on ASCII systems (it is possible that other values may be encountered).

When reading a line from a text file with the system procedure read, MAINSAIL searches for an end-of-line by searching for eol. The eol character is discarded. All characters up to the end-of-line sequence make up the line as produced by read.

eop (“end-of-page”) is a single-character STRING that indicates the end of a page of text. When printed to a file it terminates a page (the next character is written at the top of the next page). The eop character has the code 12 on ASCII systems.

Each implementation specifies a null character that is by default ignored (discarded) when encountered in a text input file. The null character code is given by $nulChar. The null character code for an ASCII character set is the code 0. See Section 43.5 for further information about the treatment of null characters in a text input file.

The MAINSAIL compiler translates character codes (e.g., in STRING constants) from the host character set (the one used by the compiler) to the target character set (the one used by the compiled program). Unique characters on the host machine are translated to unique characters on the target machine, provided that the host characters are among those shown in Table 2–1. Other characters are used at the programmer's risk; if a character cannot be translated to the target character set, a compiletime error occurs. Characters in comments are not translated to the target character set; they affect only the portability of the source text itself.

Table 2–2 shows the system procedures provided to complement the minimal assumptions guaranteed above. The argument to each is an INTEGER character code.

-1 is used in several situations to represent “no character”, since no character code can be -1 (codes are guaranteed nonnegative). For example, first(s) returns the character code of the first character of the STRING s; if s is empty (i.e., contains no characters), then first(s) is -1.

Table 2–2. Character-Set-Independent System Procedures
isLowerCase(i) TRUE if i is the code for one of a...z.
isUpperCase(i) TRUE if i is the code for one of A...Z.
isAlpha(i) TRUE if i is the code for one of a...zA...Z.
$printingChar(i) TRUE if i is alphanumeric, punctuation, or space, but not tab, eol, eop or control character.
isNul(i) isNul(i) is TRUE if i is the code for the null character, $nulChar.
$treatLikeNul(i) $treatLikeNul(i) is TRUE if i, like $nulChar, is discarded on input from files where $keepNul is not set.
prevAlpha(i) Code of the alphabetically previous character (same case) before the one with code i. Undefined if i is not the code for one of b...zB...Z.
nextAlpha(i) Code of the alphabetically next character (same case) after the one with code i. Undefined if i is not the code for one of a...yA...Y.
cvl(i) If i is the code for one of A...Z, then the result is the code for the corresponding lowercase letter; otherwise it is i itself.
cvu(i) If i is the code for one of a...z, then the result is the code for the corresponding uppercase letter; otherwise it is i itself.

2.2. Comments

A comment is used for documentation in MAINSAIL source code. A comment starts with the character # and extends to the end of the line; when the compiler comes upon #, it ignores the remainder of the line. A comment may begin anywhere on a line. An example of a comment:

a[1] := 0;           # clear first element

A large body of text may be “commented out” in three ways:

  1. Insert # at the start of every line.

  2. Use conditional compilation: IFC FALSE THENC ignored text ENDC (see Section 16.12).

  3. Use SKIPSCAN and BEGINSCAN to skip pages (see Section 16.21).

2.3. Identifiers

An
identifier is an optional dollar sign followed by a letter followed by any number of letters and digits. The letters and digits must be contiguous (e.g., no intervening spaces).

In comparing identifiers, the compiler does not distinguish between upper- and lowercase letters; e.g., it considers the identifiers typecode, typeCode, and TYPECODE to be identical. There is no “break” character for identifiers; programmers may use a mixture of lower and upper case to show the structure of identifiers. For example, sizeOfArray is more understandable than sizeofarray or SIZEOFARRAY.

Certain identifiers (keywords such as BEGIN, END, and ARRAY) are “reserved”, i.e., cannot be declared or defined by the programmer. A list of the reserved identifiers is given in Appendix G.

Certain other identifiers (e.g., tab, create, delete) are predefined by MAINSAIL and cannot be declared by the programmer.

$ is the initial character of certain predefined and predeclared identifiers used by the MAINSAIL runtime system. This avoids conflicts with the programmer's identifiers, which must not begin with $. When XIDAK creates a new predefined identifier, it begins with $. A user declaration of an identifier beginning with $ has undefined consequences.

Examples of legal identifiers are shown in Example 2–3.

Example 2–3. Legal Identifiers
i,j,k,l,m,n common integer identifiers
make4sets  
aVeryLongIdentifier same as AVERYLONGIDENTIFIER

Examples of illegal identifiers are shown in Example 2–4. Note that _ is not legal in identifiers because, for historical reasons, it is a synonym of the assignment operator.

Example 2–4. Illegal Identifiers
array reserved identifier
5th starts with a number
cost-in-$ $ and - cannot be used
weight_in_lbs _ cannot be used because it is a synonym for the assignment operator

2.4. Use of Semicolons and Formatters

Semicolons separate (rather than terminate) declarations and statements. They terminate compiler directives, procedure headers, and macro definitions.

Usually any number of whitespace characters (e.g., spaces, tabs, and ends-of-line) may separate syntactic units. When in doubt, consult the description of the language construct in question.

2.5. Compiletime Evaluation

If a BOOLEAN, (LONG) INTEGER, (LONG) BITS, or STRING operation (unary or binary operator) has constant operands, the compiler evaluates it at compiletime. A call to a system procedure declared with qualifier COMPILETIME is evaluated at compiletime if all the arguments are constants. The term constant expression refers to an expression that can be evaluated at compiletime.

All compiletime (LONG) INTEGER arithmetic is carried out on STRING representations (with a very large number of digits) so that the capabilities of the computer on which the compiler is running do not affect the results. (LONG) INTEGER arithmetic operations that overflow may therefore not have the same result at compiletime as they would have had if the operations had been performed on the target at runtime.

(LONG) BITS operations are performed on STRING representations that have the same number of digits as (LONG) BITS on the target machine, so that operations that discard bits to the left have the same results at compiletime as at runtime.

STRING operations evaluated at compiletime produce the same result that would have been produced if they had been evaluated at runtime; i.e., they act on STRINGs as translated to the target character set, rather than as on the host machine.

If an expression contains constant subexpressions, the subexpressions should be enclosed in parentheses to ensure that they are evaluated at compiletime. For example, i + 2 + 4, where i is not evaluated at compiletime, should be written as i + (2 + 4) to ensure that the addition of 2 and 4 is done at compiletime. It might otherwise be treated as (i + 2) + 4, which involves two additions during execution.

2.6. Storage Units and Character Units

A storage unit is the basic measure for the amount of memory required by the various data types. For example, a storage unit may represent a “byte” or “word”, although these terms are not usual in XIDAK documentation. Every storage unit contains a processor-dependent number of bits, given by the predefined constant $bitsPerStorageUnit. (Currently, a storage unit is one byte on all existing MAINSAIL implementations, but this has not been true of some historical implementations, and it is at least theoretically possible that a future implementation may have a storage unit size that is not 8 bits.)

Data files and memory in which values of the MAINSAIL data types are stored are viewed as linear sequences of storage units. Storage units are employed in situations requiring a measure of memory or file size without regard to data type, e.g., as an argument to the system procedure setPos for a data file.

The system procedure size or $lSize can be used to determine the number of storage units occupied by a particular data type or record; $sizeOfValue is similar, but takes a more general argument. The procedure $ioSize can be used to determine how many storage or character units a data type occupies in a given data file, which may be different from the size of that data type in memory. DSP and $LDSP return the offset in storage units from the start of a record to a field in the record.

Text files and memory in which characters are stored may be viewed as linear sequences of character units. Character units are employed when a file or memory position must be specified as the number of characters it contains, e.g., as an argument to the system procedure setPos for a text file. A character unit is always eight bits.

Character units may not necessarily coincide with storage units, but storage units are always an integral multiple of character units. The number of bits per character unit (eight) is defined as $bitsPerChar, so the number of character units per storage unit is given by $bitsPerStorageUnit DIV $bitsPerChar.

2.7. Type Codes

Each data type is assigned an INTEGER type code that is used in various ways in MAINSAIL, e.g., as an argument to the compiletime system procedure size.

Predefined INTEGER constants for the type codes for basic MAINSAIL types are shown in Table 2–5. For example, size(integerCode) is the number of storage units an INTEGER occupies in memory.

In addition, there exist extended type codes, describing sized data types (see Section 3.12). The complete list of extended type codes appears in Appendix A.

Table 2–5. Type Codes
booleanCode
integerCode
longIntegerCode
realCode
longRealCode
bitsCode
longBitsCode
stringCode
addressCode
charadrCode
pointerCode
$recordCode
$procvarCode

2.8. Garbage Collections and Memory Management

The MAINSAIL runtime system automatically reclaims the space occupied by dynamic objects (dynamic records, dynamic arrays, and data sections) and by STRING text if the objects or text becomes inaccessible. A dynamic object is inaccessible if no accessible POINTER (local variable or POINTER in an accessible dynamic object; see Section 2.8.1) references it; STRING text is inaccessible if no accessible STRING descriptor references it. The process of reclaiming space is called garbage collection.

In addition to garbage collection, automatic memory management can move dynamic objects and STRING text around in memory as well as deallocate them. This is invisible to a correctly written program, since the referencing variables are automatically updated to point to the moved data.

In this manual, the term “garbage collection” is often used loosely in this manual to denote memory management operations in general. Strictly speaking, only the freeing of inaccessible dynamic objects or STRING text constitutes garbage collection; other kinds of operations, such as moving dynamic objects around, is not really garbage collection. Nonetheless, this manual uses phrases like “during a garbage collection” to refer to the time when MAINSAIL is doing automatic memory management of any kind, or “cannot trigger a garbage collection” to say that a particular construct cannot trigger automatic memory management.

Variables of the types ADDRESS and CHARADR are not updated during a garbage collection, even if they point to structures that may be moved. The user ordinarily uses variables of these data types to point into scratch space (or static space), i.e., areas of memory in which data are not collected. Scratch space may be obtained by calling the system procedure newPage or the system procedure newScratch.

Constructs that may trigger garbage collections are noted in this manual.

Since garbage collections may take a great deal of time, the programmer may wish to prevent collections if he or she knows that few inaccessible data are being generated. This is best accomplished by calling the system procedure $memoryManagementInfo (see Section 41.9); it can also be accomplished by incrementing the system variable $collectLock (although doing this may cause MAINSAIL to run out of memory if inaccessible data are in fact being generated). The system procedure $collect causes a collection to be performed, even when $collectLock is nonzero. Collections can also be prevented (or reduced in scope or frequency) indirectly by calling the system procedure dispose to deallocate data structures explicitly whenever possible, or by keeping data in areas (see Chapter 25) and disposing of all the data in an area at once with $disposeArea.

Frequency of garbage collection can be controlled by using the utility CONF to set various parameters in a MAINSAIL bootstrap; see Chapter 6 of the MAINSAIL Utilities User's Guide for details.

2.8.1. How the MAINSAIL Garbage Collector Determines Whether Data Are Accessible

The MAINSAIL garbage collector operates by reclaiming data that are not accessible. This section presents a detailed description of how the garbage collector determines which data are accessible.

A dynamic object is accessible if any of the following are true (for the purposes of the following discussion, an active procedure is any procedure for which a stack frame exists in any currently existing coroutine):

STRING text is accessible if a local variable in an active procedure points to it or if a STRING variable in an accessible dynamic object points to it.

2.9. cmdFile and logFile

cmdFile and logFile are files associated by default with a MAINSAIL execution's primary input and primary output (usually terminal input and terminal output), respectively. They are described in Section 22.12.
previous   next   top   complete contents   complete index   framed top   this page unframed

MAINSAIL Language Manual, Chapter 2