MAINSAIL Language Manual, Chapter 3

previous   next   top   complete contents   complete index   framed top   this page unframed


3. Data Types

This chapter describes MAINSAIL's twelve data types: BOOLEAN, INTEGER, LONG INTEGER, REAL, LONG REAL, BITS, LONG BITS, STRING, POINTER, $PROCVAR, ADDRESS, and CHARADR. Records, data sections, and ARRAYs, which are “data structures” rather than “data types” in MAINSAIL terminology, are described in Chapters 10, 11, and 12, respectively.

Associated with each data type is a set of values and a set of operations that may be performed on the values. The set of values associated with each data type includes a value called the Zero of the data type. The memory representation of the Zero of every data type consists entirely of 0-bits.

There is no implicit data type conversion in MAINSAIL. For example, if i is an INTEGER variable and r a REAL variable, then i + r is an illegal expression. Conversion procedures are provided to convert arguments to another data type. They are discussed in Section 3.11. cvi, for example, is a procedure that converts its argument to an INTEGER; i + cvi(r) is a legal expression.

The difference between the data types INTEGER, REAL, and BITS and their corresponding LONG types is in the range of values the corresponding types may take on. For example, the guaranteed range (range of values that may be assumed in a portable program) of a LONG REAL specifies more digits than that of a REAL. You should use the LONG forms only when necessary, since LONG values may take more space and LONG operations more time on some processors.

The guaranteed range of a data type applies only to a value of the data type for which no explicit size is given; an explicit size may override the default range. See Section 3.12 for details.

For each data type discussed in this chapter, a list of the operators that may be used with values of the data type is given. All operators are described in more detail in Section 4.8. Each data type description also includes a brief description of some of the system procedures that may be used with values of the data type. Complete system procedure descriptions are given starting in Chapter 30.

XIDAK reserves the right to create new MAINSAIL data types, and to enhance any system procedure, macro, or variable to handle such new data types.

3.1. BOOLEAN

BOOLEAN values are the logical values true and false. The boolean constants are TRUE and FALSE; these represent the only two values a BOOLEAN variable may have. The boolean Zero is FALSE.

The following operators may be used with boolean expressions:

OR  AND  NOT  =  NEQ  :=

3.2. INTEGER and LONG INTEGER

INTEGER and LONG INTEGER are data types for representing mathematical whole numbers. INTEGERs and LONG INTEGERs have the same guaranteed range, namely, -2147483647 to 2147483647. However, on platforms that support 64-bit addresses, INTEGERs are typically 4 bytes and LONG INTEGERs 8 bytes.

An INTEGER constant is composed of an optional minus sign (-) followed by decimal digits (0 through 9). Some examples are 1874, -53, and 0.

A LONG INTEGER constant is like an INTEGER constant except that it must be immediately followed by the letter L (or lowercase l), e.g., 1874L, -53L, 0L, or 29875234l.

A character enclosed in single quotes represents the INTEGER constant of which the value is the target-machine character code of the enclosed character. For example, 'A' represents the INTEGER constant that is the character code of the letter “A” on the target machine. Character codes are discussed in Section 2.1.

The INTEGER Zero is 0, and the LONG INTEGER Zero is 0L (or 0l).

The following operators may be used with (LONG) INTEGER expressions:

OR     =      LEQ     :=     DIV      +
AND    NEQ    >       MIN    MOD      - (unary and binary)
NOT    <      GEQ     MAX    *        ^

The following system procedure may operate on (LONG) INTEGER expressions:

abs absolute value of a (LONG) INTEGER

3.3. REAL and LONG REAL

REAL and LONG REAL are data types for representing floating point numbers. A floating point number consists of a fraction and a power-of-ten exponent; the value of the number is the product of the fraction and ten to the power of the exponent. For a REAL, the fraction is guaranteed to have at least six full decimal digits of significance. The exponent is guaranteed a range wide enough that at least one number less than or equal to ten to the minus 38th power (1.0E-38) can be represented as a REAL, and at least one number greater than or equal to ten to the plus 38th power (1.0E+38) can be represented as a REAL. For a LONG REAL, the fraction is guaranteed to consist of at least 11 full decimal digits, and the exponent range is guaranteed to be at least as large as that of a real exponent.

A REAL constant is like an INTEGER constant except that it has either a decimal point, an exponent, or both. An exponent immediately follows the last digit (or the decimal point if it is last), and is the letter E (or e) immediately followed by an integer. A nonnegative exponent may be separated from E by +. Some REAL constants are:

1874.56
-.78E-3 (= -.00078)
0.
1E3 (= 1e3 = 1E+3 = 1.
E3 = 1.E+3 = 1000.)

A LONG REAL constant is like a REAL constant except that it must be immediately followed by the letter L (or lowercase l), e.g.:

12387658.5L
-.57E28L
0.0L (= 0.
L)

The REAL Zero is 0., and the LONG REAL Zero is 0.L (or 0.l).

The following operators may be used with REAL and LONG REAL expressions:

OR     =      LEQ     :=     +       /
AND    NEQ    >       MIN    ^       - (unary and binary)
NOT    <      GEQ     MAX    *

The system procedures below may be used with REAL and LONG REAL expressions. Trigonometric functions such as sin, cos, and log are also provided:

abs absolute value of a (LONG) REAL
ceiling smallest (LONG) INTEGER not exceeded by a (LONG) REAL
floor largest (LONG) INTEGER not exceeding a (LONG) REAL
truncate truncate a (LONG) REAL to a (LONG) INTEGER

3.4. BITS and LONG BITS

BITS and LONG BITS are data types for representing sequences of bits. Bits may take part in bit operations such as masking, shifting, and testing.

The guaranteed range of BITS and LONG BITS is the same, namely, at least 32 bits. However, on platforms that support 64-bit addresses, a BITS is typically 4 bytes (32 bits) and a LONG BITS 8 bytes (64 bits).

BITS and LONG BITS differ from INTEGER and LONG INTEGER in that (LONG) BITS operations are bitwise logical operations; (LONG) INTEGER operations are arithmetic (numerical) operations. Values of one data type may be easily converted to the other, if it is necessary to view a value alternately in one way, then another.

A bit has two states, 0 and 1, sometimes called 0-bit and 1-bit or clear and set. To cause a bit to enter the 0 state is to clear it; to cause it to enter the 1 state, to set it.

A BITS constant is a sequence of characters preceded by a single quote and a letter that indicates the base: B (or b) for binary (base 2), O (or o) for octal (base 8), or H (or h) for hexadecimal (base 16). The base letter may be omitted for octal; i.e., octal is the default.

Each binary character (0 or 1) represents a single bit. Each octal character (0 through 7) represents three bits (000 through 111). Each hexadecimal character (0 through 9, A through F) represents four bits (0000 through 1111). The lowercase letters a through f, like A through F, can be used to represent the bit patterns 1010 through 1111 in hexadecimal constants.

The bits for each character are concatenated to obtain the bits of the constant. For example, 'B101011, 'O53 (or just '53) and 'H2B all represent the same bit sequence 101011 (ignoring leading zeros).

Other examples of BITS constants are '573, 'B10111, and 'H82A3.

A LONG BITS constant is like a BITS constant except that it must be immediately followed by the letter L (or lowercase l), e.g., '743L (= 'B111100110L = 'H1D6L = 'h1d6l).

The BITS Zero is '0 (or equivalently 'B0 or 'O0 or 'H0); the LONG BITS Zero is the BITS Zero followed by L (or lowercase l), e.g., '0L.

Bits are numbered from right to left starting with zero.

The following operators may be used with BITS and LONG BITS expressions:

OR      =       NTST       :=     MSK    SHR
AND     NEQ     TSTA       IOR    CLR    !
NOT     TST     NTSTA      XOR    SHL

The following system procedures may operate on BITS and LONG BITS expressions:

bMask form a BITS mask (sequence of 1-bits)
lbMask form a LONG BITS mask (sequence of 1-bits)
$lbOnes LONG BITS value consisting of all 1-bits

3.5. STRING

STRING is a data type for representing and manipulating sequences of characters.

A STRING is a variable-length sequence of characters. MAINSAIL automatically keeps track of how many characters are in a STRING.

The limit on the number of characters in a STRING is the maximum INTEGER that can be represented; thus, the smallest maximum STRING length any MAINSAIL implementation enforces is 2147483647 characters (although the effective maximum length will be less if you have less than 2 Gb of memory available).

The constant $maxStringLength represents the maximum allowable STRING length, and is defined to be $maxInteger.

A STRING constant is a sequence of characters enclosed in double quotes. Some examples are shown below. A double quote is represented in a STRING constant with two consecutive double quotes. Each such pair of double quotes stands for one double quote inside the STRING. For example, the last STRING in the list below contains two embedded double quote characters. It contains 23 characters; the two extra double quotes are not retained as part of the STRING constant, since they are only indicators to the compiler.

"Hello"
"
She is 12 years old"
"
The umbrella cost $2.50"
"
He cried ""Wolf!"" again."

A STRING constant may extend across line and page boundaries; the characters that indicate the boundaries are part of the constant. For example, the STRING:

"This is a STRING constant that extends
across a line boundary in the source text"

has an embedded eol. It could also be written:

"This is a STRING constant that extends " & eol &
"
across a line boundary in the source text"

The concatenations are performed at compiletime, since all the STRINGs involved are constants.

The STRING Zero (sometimes called the null STRING or the empty STRING) is "". It is the STRING consisting of no characters.

& is the concatenation operator. s1 & s2 is the STRING consisting of the characters of s1 immediately followed by the characters of s2. Thus, if s1 has the value:

"This is "

and s2 has the value:

"a concatenated STRING"

then the expression s1 & s2 has the value:

"This is a concatenated STRING"

Substrings are described in Section 4.4, and STRING comparison in Section 4.8.2.

The following operators may be used with STRING expressions:

OR      =       LEQ       :=      &
AND     NEQ     >         MIN
NOT     <       GEQ       MAX

The following system procedures may operate on STRING expressions:

length number of characters in a STRING
cvu convert a STRING to upper case
cvl convert a STRING to lower case
compare return -1, 0, or 1 to indicate comparison of two STRINGs (see Section 4.8.2). Can be made to treat upper and lower case identically, i.e., a “caseless” comparison
equ returns TRUE if two STRING arguments are equal. Like compare, can do a caseless comparison
first first character of a STRING
last last character of a STRING
$nth nth character of a STRING
read reads a value from a STRING
write writes a value to a STRING
cRead reads a character from a STRING
cWrite writes a character to a STRING
rcRead reads a character from the end of a STRING (reverse cRead)
rcWrite writes a character to the front of a STRING (reverse cWrite)
$dup reduplicate a STRING (concatenate with itself)
scan scans a STRING according to a scan specification
scanSet sets up scan bits to be used with scan
$scanSet sets up scan integers to be used with scan
scanRel releases scan bits or integers used with scan
$cScan scan for single character
newString create a STRING descriptor from a CHARADR and a length
$getInArea ensure that a STRING is in MAINSAIL's STRING space
$removeLeadingBlankSpace, $removeTrailingBlankSpace remove blank space from a STRING
$removeWord remove non-blank characters from a STRING
$removeLastWord remove trailing non-blank characters
$removeBoolean, $removeBits, $removeInteger, $removeReal parse STRING of specified data type
$formParagraph fill and justify STRING

3.5.1. Low-Level STRING Manipulation

STRINGs are represented in memory as STRING descriptors, composed of a length and a character address. STRING descriptors usually point to characters stored in a region of memory called STRING space. The characters stored in STRING space are subject to garbage collection if they become inaccessible (i.e., no STRING descriptor points to them). The characters of a STRING allocated in scratch space or created by a foreign language procedure do not reside in STRING space. The user who needs to move such a STRING into MAINSAIL's STRING space may do so by means of the system procedure $getInArea.

Most programs that do not call foreign language procedures do not need to manipulate STRINGs or STRING descriptors explicitly with newString or $getInArea.

3.5.2. STRING Constants and Garbage Collection

The first time each STRING constant in a MODULE is used, its characters may be copied into STRING space. This can trigger a garbage collection. Subsequent uses of the same STRING constant (in the same MODULE) use the previously copied characters, and so do not cause a collection.

3.6. POINTER

POINTER is a data type for referencing dynamic objects, i.e., dynamic records, data sections, and dynamic ARRAYs. Records are described in Chapter 10, data sections in Chapter 11, and ARRAYs in Chapter 12. POINTERs are frequently classified, i.e., associated with a particular CLASS, as described in Section 9.2.

Only unclassified POINTERs can be used to refer to ARRAYs; see Section 9.3. If you know that a dynamic object is an ARRAY, you should use an ARRAY variable rather than a POINTER variable to refer to it; see Chapter 12.

The only POINTER constant is NULLPOINTER, which is the POINTER Zero. A NULLPOINTER references no object.

The following operators may be used with POINTER expressions:

OR  AND  NOT  =  NEQ  :=

3.7. $PROCVAR

$PROCVAR (short for procedure variable) is a data type for referencing PROCEDUREs that could be invoked in the current MAINSAIL execution. A $PROCVAR may be used to invoke the PROCEDURE it references.

The only $PROCVAR constant is $NULLPROCVAR, which is the $PROCVAR Zero. It references no PROCEDURE.

The following operators may be used with $PROCVAR expressions:

OR  AND  NOT  =  NEQ  :=

Chapter 8 describes $PROCVARs in more detail.

3.8. ADDRESS

ADDRESS is a data type for representing the location of a storage unit in memory. ADDRESSes may be used for loading and storing values of any data type to and from memory. Individual characters are usually loaded and stored by means of the data type CHARADR.

ADDRESS is a “low-level” data type; many user programs can be written without the use of ADDRESSes.

Not every ADDRESS representable on a processor is a valid MAINSAIL ADDRESS. On some implementations of MAINSAIL, an ADDRESS that is not a multiple of the size of the smallest data type is considered unaligned (or non-data-type-aligned) and is invalid. Portable programs must therefore compute ADDRESSes as linear combinations of exact multiples of the sizes of MAINSAIL data types, starting from some ADDRESS that is known to be properly aligned (e.g., an ADDRESS obtained from the system procedure newScratch or newPage, or an ADDRESS that is the start of a dynamic object). Furthermore, during any particular execution of MAINSAIL, some ADDRESSes may be invalid for reading or writing because the storage units they reference are protected by the operating system; therefore, ADDRESSes should point into memory that has been properly requested from the operating system. Storage units in regions of memory allocated by the system procedures newScratch and newPage will always have been properly requested from the operating system. Storage units in regions of memory allocated by the system procedure new are also valid for reading and writing, although invalid values (e.g., POINTERs not pointing to a valid MAINSAIL data structure) should not be stored into a MAINSAIL data structure. Storing into arbitrary or unallocated memory has undefined effects; for example, doing so may overwrite executable code or MAINSAIL runtime data structures, thereby damaging them.

The use of an invalid ADDRESS is undefined.

On some processors, read and write to an address may align the address before performing the operation. The address is increased, if necessary, to the minimum alignment the processor requires for the data type being read or written. For example, on the PA64 processor, where INTEGERs occupy 4 bytes and LONG INTEGERs 8 bytes, read(a,ii1,i,ii2), where i is an INTEGER and ii1 and ii2 are LONG INTEGERs, automatically aligns by skipping 4 bytes before reading ii2.

The address of a collectable MAINSAIL data structure may change if a garbage collection occurs; an ADDRESS variable is not updated in such a case. Collectable data are normally referenced with the POINTER and STRING data types, which are updated when a garbage collection occurs.

ADDRESSes may be classified like POINTERs; see Section 9.2.

The only ADDRESS constant is NULLADDRESS, which is the ADDRESS Zero.

ADDRESSes are ordered with respect to the relative position of the referenced storage units in memory. It is this order that is used when comparing ADDRESSes, or using MIN or MAX on an ADDRESS.

The following operators may be used with ADDRESS expressions:

OR      AND     NOT     =       NEQ     <
LEQ     >       GEQ     :=      MIN     MAX

The following system procedures may be used in operations with ADDRESSes:

clear clears storage units of memory
copy copies storage units from one memory location to another
xLoad loads a value (of data type x) from memory; see Section 40.16
store stores a value into memory
displace returns an ADDRESS that is displaced a given number of storage units from another ADDRESS
displacement, lDisplacement computes the distance between two ADDRESSes
newPage gets some memory pages
pageDispose disposes of pages obtained with newPage
newScratch returns the ADDRESS of some memory for scratch space
scratchDispose disposes of scratch space
read reads a value from an ADDRESS
write writes a value to an ADDRESS

3.9. CHARADR

CHARADR (“character address”) is a data type for representing the address of a character unit in memory. Data other than characters are usually loaded and stored by means of the ADDRESS data type.

CHARADR is a “low-level” data type; many user programs can be written without the use of CHARADRs.

As with ADDRESSes, there may be CHARADR values at which the effect of performing a load or store is undefined, because the memory has not been properly allocated; see Section 3.8. Unlike ADDRESSes, however, there is never any alignment requirement for storing at or loading from a CHARADR.

The only CHARADR constant is NULLCHARADR, which is the CHARADR Zero.

The following operators may be used with CHARADR expressions:

OR      AND     NOT     =       NEQ     <
LEQ     >       GEQ     :=      MIN     MAX

The following system procedures may be used in operations with CHARADRs:

clear clears character units of memory
copy copies characters from memory starting at one CHARADR to memory starting at another CHARADR
cLoad loads a character from memory
store stores a character into memory
cRead reads a character from memory
cWrite writes a character to memory
displace returns a CHARADR that is displaced a given number of characters from another CHARADR
displacement computes the distance between two CHARADRs
newString makes a STRING (descriptor) from a CHARADR and an INTEGER (length)

3.10. Overflow and Underflow

A program that generates a value outside of the machine-dependent range of its data type behaves in an undefined fashion. Overflow and underflow are not necessarily caught, although MAINSAIL usually tries to catch overflow and underflow whenever doing so does not entail a performance penalty.

It is an error to use a constant that cannot be represented on the target machine, e.g., an INTEGER that is too large.

In general, MAINSAIL does not support a portable notion of arithmetic exceptions, especially overflow, and particularly floating-point overflow. The notion of overflow visible to a MAINSAIL program is whatever notion is supported by the underlying hardware, with all its quirks. Since different platforms have different notions of overflow, overflow is not a portable concept. Programs should not be written expecting the rules for overflow to be the same across different platforms.

In particular, on some platforms, intermediate calculations are done using more precision than can be represented by variables of a given type. Thus, overflow (as defined by the underlying hardware) might occur only when a result is finally stored in memory, since the representable range of exponents is smaller for memory operands than for intermediate operands. Usually results are stored in memory at a point in the program fairly near the point where they were calculated; however, the store could be far away from the calculation, maybe even in a different procedure.

For example, a value returned by a RETURN statement might be too large to be representable in memory, but not too large for an intermediate representation. It is possible for such a value to be calculated by the processor with no overflow, and returned (in a processor register large enough to hold the intermediate representation) to the caller where it is then stored in a variable, at which point overflow would occur. A programmer expecting overflow to occur when the value was originally calculated would be disappointed. According to the processor's notion of overflow, the calculation didn't overflow at all; overflow occurred only when the caller eventually stored the result in memory.

Programs that calculate values that are outside the allowed range of values for a given type are not legal programs, and the results of executing such programs are undefined. XIDAK does what it can, within reason, to do something sensible, but there are no guarantees. Overflow ultimately occurs only when the processor says it occurs.

3.10.1. How to Write (LONG) INTEGER Addition and Multiplication Routines That Do Not Overflow

Some MAINSAIL programmers have developed code on platforms where arithmetic overflow is not detected by default, and have come to depend on the lack of overflow. These users have been unpleasantly surprised when moving their programs to a platform where (LONG) INTEGER overflow is detected by default. MAINSAIL does not provide a way to disable (LONG) INTEGER overflow detection on such platforms.

If you have such a program, you should replace the overflowing multiply and add operators with calls to portable arithmetic routines that perform addition and multiplication without triggering overflow. These routines should look something like the routines noOverflowAdd and noOverflowMultiply in the MODULE FOO below. This MODULE has been written specifically to handle 32-bit LONG INTEGERs on processors that support two's-complement arithmetic; to handle different conventions, you would need to modify the code somewhat.

  BEGIN "foo"

  
IFC size(longIntegerCodeNEQ 4 THENC
      # 
This code works only for 32-bit integers
      
MESSAGE "Long integers are not 32 bits!","error";
  
ENDC

  
IFC $attributes TST $onesComplement THENC
      # 
This code works only for two's-complement arithmetic
      
MESSAGE "Integer arithmetic is onescomplement!","error";
  
ENDC

  
LONG INTEGER PROCEDURE noOverflowAdd (LONG INTEGER a,b);
  # 
Add two 32-bit long integers a and b without overflow.
  
BEGIN
  
BOOLEAN carry;
  
INTEGER i;
  
LONG BITS aa,bb,res,m,n,am,bm;
  
aa := cvlb(aMSK 'H3FFFFFFFL;
  
bb := cvlb(bMSK 'H3FFFFFFFL;
  
res := cvlb(cvli(aa) + cvli(bb));
  
carry := res TST 'H40000000L;

  
FOR i := 30 UPTO 31 DOB
      
m := '1L SHL i;
      
am := cvlb(aMSK mbm := cvlb(bMSK m;
      
n := am XOR bm;
      
IF carry THEN n .XOR m;
      
carry := (am AND bmOR (am AND carryOR (bm AND carry);
      
res .IOR n END;

  
RETURN(cvli(res));
  
END;


  
LONG INTEGER PROCEDURE noOverflowMultiply (LONG INTEGER a,b);
  # 
Let aubu be the uppermost 4 bits of a and bambm the 14
  # 
next lower-order bitsand albl be the lowest-order 14 bits.
  # 
Then:
  #     
a * b = au * bu * 2 ^ 56
  #       + (
au * bm + am * bu) * 2 ^ 42
  #       + (
au * bl + am * bm + al * bu) * 2 ^ 28
  #       + (
am * bl + al * bm) * 2 ^ 14
  #       + 
al * bl
  # 
None of the intermediate multiplications or additions can
  # 
overflow.
  
BEGIN
  
LONG INTEGER al,au,am,bm,bl,bu;

  
au := cvli(cvlb(aSHR 28);
  
IF cvlb(aTST 'H80000000L THEN
      
au := cvli(cvlb(auIOR 'HFFFFFFF0L);
  
am := cvli((cvlb(aSHR 14) MSK 'H3FFFL);
  
al := cvli(cvlb(aMSK 'H3FFFL);

  
bu := cvli(cvlb(bSHR 28);
  
IF cvlb(bTST 'H80000000L THEN
      
bu := cvli(cvlb(buIOR 'HFFFFFFF0L);
  
bm := cvli((cvlb(bSHR 14) MSK 'H3FFFL);
  
bl := cvli(cvlb(bMSK 'H3FFFL);

  
RETURN(
      
noOverflowAdd(
          
cvli(cvlb(au * bl + am * bm + al * buSHL 28),
          
noOverflowAdd(
              
cvli(cvlb(am * bl + al * bmSHL 14),
              
al * bl)));
  
END;


  
INITIAL PROCEDURE;
  
BEGIN
  
LONG INTEGER i,j;
  
DOB i := $liGet("First long integer: ");
      
j := $liGet("Second long integer: ");
      
write(logFile,noOverflowMultiply(i,j),eol,i * j,eolEND;
  
END;

  
END "foo"

3.11. Conversion Procedures

A conversion procedure converts from one data type to another. For example, if the value of the REAL variable r is 8., then cvi(r) has the INTEGER value 8, where cvi is the convert-to-INTEGER procedure. MAINSAIL does not provide implicit data type conversion; the programmer is responsible for using conversion procedures where necessary.

A conversion procedure for converting a value of type x to type y is provided for each xy combination for which the box is marked the following table (which uses the data type abbreviations listed in Table 1–1):

    y
    i li r lr b lb s a c p
 x  i  *   *   *   *   *   *   *       
li  *   *   *   *   *   *   *   *     
r  *   *   *   *       *       
lr  *   *   *   *       *       
b  *   *       *   *   *       
lb  *   *       *   *   *   *   *   
s  *   *   *   *   *   *   *     *   
a    *         *     *   *   * 
c            *     *   *   
p                *     * 

MAINSAIL does not guarantee to catch underflow or overflow in conversions. The effect is undefined of calling one of the MAINSAIL system routines that converts a STRING to numeric value (e.g., read, cvi, cvli, cvr, cvlr) if the numeric value is outside the range supported by the processor.

3.12. Explicit Data Sizing

By default, a given MAINSAIL data type on a given processor always occupies the same amount of space in memory. You can override that default and specify an explicit size with many MAINSAIL data types.

There are two reasons why you might want to override the default size for a MAINSAIL data type:

  1. Foreign languages often include data types whose size is not the same as the default size of the corresponding MAINSAIL data type. Furthermore, foreign languages may have some data type sizes that do not correspond to the default size of any MAINSAIL data type. This makes it difficult to construct a MAINSAIL data structure parallel to a foreign data structure. To solve this problem, you can explicitly specify the size in bytes of a field (normally, a field of an aligned record, so that the alignment of all fields in the record is correct for the foreign language; see Section 9.10).

  2. You may be able to save space when you know that the range of possible values for a particular variable is smaller than the range implied by the default size of the variable's data type. For example, you may know that a particular INTEGER will always be in the range -5 to 5, and therefore needs only one byte to represent it. You can specify that MAINSAIL use only one byte for such a value. In many cases the amount of memory saved by such an explicit specification will be negligible, but there is at least one important exception: where the variable is an element of a large ARRAY. For example, if the default size of an INTEGER is 4 bytes, but you know the values in a million-element ARRAY never require more than one byte to represent, then the amount of memory wasted by failing to specify an explicit size for the ARRAY elements is about 3 megabytes.

3.12.1. Allowed Data Types and Sizes for Explicit Sizing

A type qualified with a size specification is said to be explicitly sized and the sized type is called the base type. A size specification takes the form of a parenthesized INTEGER following the data type name.

BOOLEAN and (LONG) BITS may be followed by (n) to indicate that only the low-order n bytes of the value are stored in memory, where n is 1, 2, or 4. If necessary, the value is zero-extended when accessed.

(LONG) INTEGER may be followed by (n) or (-n) to indicate that only the low-order n bytes of the value are stored in memory, where n is 1, 2, or 4. A size specification of (-n) forces the value to be sign-extended (if necessary) when accessed; (n) with no minus sign specifies zero-extension.

A value that is explicitly sized to be smaller than its base type is expanded to (at least) the size of its base type when it is accessed. This expansion sign-extends the value for a negative size specification, and zero-extends otherwise. Zero-extension does not mean that MAINSAIL uses unsigned arithmetic on the accessed value; it indicates only how to expand the value.

You cannot make a field larger than the natural size of its base type: if a base type is of size m, and n is greater than m, then an explicit size of n or -n for that type is a compiletime error. You cannot specify general unsigned arithmetic on data types that support only signed arithmetic: if the base type is INTEGER or LONG INTEGER, and the base type size is m, then a size of m is also a compiletime error, but -m is allowed (and has no effect, since it specifies the default behavior). If the base type is any type other than INTEGER or LONG INTEGER, then a negative size is prohibited, and an explicit size of m is legal and has no effect.

On platforms where the size of INTEGER is 2 bytes, 4 (or -4, in the case of LONG INTEGER) would be a legal size only for the LONG types (for which it currently has no effect, since these types are 4 bytes on all presently supported platforms).

In summary, MAINSAIL's guaranteed data type ranges in combination with the rules for explicit data type sizes imply the following:

Data Type Legal?
BOOLEAN(1) always
BOOLEAN(2) always
BOOLEAN(4) on platforms where size(booleanCode) GEQ 4
INTEGER(-1) always
INTEGER(-2) always
INTEGER(-4) always
INTEGER(1) always
INTEGER(2) always
INTEGER(4) illegal on all current platforms, but would be legal where size(integerCode) > 4
LONG INTEGER(-1) always
LONG INTEGER(-2) always
LONG INTEGER(-4) always
LONG INTEGER(1) always
LONG INTEGER(2) always
LONG INTEGER(4) illegal on all current platforms, but would be legal where size(longIntegerCode) > 4
BITS(1) always
BITS(2) always
BITS(4) always
LONG BITS(1) always
LONG BITS(2) always
LONG BITS(4) always

The only types that are guaranteed to need zero-extension on all platforms are (LONG) INTEGER(1), LONG INTEGER(2), (LONG) BITS(1) and LONG BITS(2), since INTEGER and BITS are guaranteed to be at least 2 bytes, and LONG INTEGER and LONG BITS at least 4 bytes.

When a value is stored into an explicitly-sized variable that is smaller than its base type, the value must be truncated. If ACHECK is in effect, and the base type is (LONG) INTEGER, overflow is reported if the truncated value, when reexpanded to the size of the base type, would be different from the original value reexpanded to the size of the base type. High-order bits are silently discarded for a (LONG) BITS, whether or not ACHECK is in effect.

In read(a,v) and write(a,v), the incrementing of a is not affected by the size of v; i.e., these procedures read or write a value at a of v's base type's size, then displace a by the base type's size. v's size affects only how v is represented in memory, not how the value at ADDRESS a is accessed. To load or store an explicitly-sized value from an ADDRESS, use the sized load and store procedures (see Sections 40.16.1 and 47.50), or use an explicitly sized record field (or inplace ARRAY element), as follows:

    CLASS c (INTEGER(-2) f);
    
ADDRESS(ca;
    
INTEGER v;
    
v := a.f;   # load a 2-byte integer
    
a.f := v;   # store a 2-byte integer

3.12.2. Assignment Compatibility

Assignment compatibility of scalar types is determined by the base type, and hence is not affected by explicit sizing; e.g., INTEGERs of any size can be assigned to one another.

Elements must have the same base type, size, and sign extension in order for two inplace ARRAYs to be assignment compatible:

LONG INTEGER(1) $INPLACEARRAY(0 TO 10) ary1;
LONG INTEGER(2) $INPLACEARRAY(0 TO 10) ary2;
...
ary2[i] := ary1[i]; # legal
ary1[i] := ary2[i]; # legalbut could overflow
ary1 := ary2; # ILLEGAL

3.12.3. Where Explicitly Sized Data Types May Occur

Explicitly sized data types can occur wherever a normal data type could occur, i.e., in the declarations of:

Note that declaring arrays of BOOLEAN to be arrays of BOOLEAN(1) will typically save space with little or no runtime penalty (depending on the relative efficiency of loading bytes and words on the processor in question).

Temporary feature: subject to change

Although any explicit size specified for local variables, parameters, PROCEDURE return values, outer variables, and local OWN variables is currently ignored (except for FLI parameters in MAINSAIL Version 16.29 and later), later versions of MAINSAIL may do something with explicit sizes specified in these contexts. Therefore, it is not advisable to specify explicit sizes unless those sizes actually correspond to the number of bits you intend to be represented in the value declared.

3.12.4. Explicitly Sized Data Types and GENERIC Procedures

Temporary feature: subject to change

The GENERIC PROCEDURE selection mechanism (see Section 7.16) currently ignores explicit sizes associated with data types (other than array elements). Thus, given the following declarations:

PROCEDURE   p1          (INTEGER(1) i);
PROCEDURE   p2          (INTEGER(-2) i);
GENERIC PROCEDURE p "p1,p2";

p1 will be the only instance PROCEDURE ever chosen for p (because it comes first in p's instance PROCEDURE list), regardless of how or whether p's INTEGER argument is explicitly sized.

This behavior may change in future versions of MAINSAIL. In Version 16, XIDAK recommends against including a PROCEDURE with explicitly sized parameters in a GENERIC PROCEDURE instance list, as the semantics of doing so may change.


previous   next   top   complete contents   complete index   framed top   this page unframed

MAINSAIL Language Manual, Chapter 3