Mercurial > hg > index.cgi
diff docs/manual.docbook.sgml @ 0:2c24602be78f
Initial import from lwtools 3.0.1 version, with new hand built build system and file reorganization
author | lost@l-w.ca |
---|---|
date | Wed, 19 Jan 2011 22:27:17 -0700 |
parents | |
children | fd1ecc5d6e69 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/manual.docbook.sgml Wed Jan 19 22:27:17 2011 -0700 @@ -0,0 +1,2180 @@ +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.5//EN"> +<book> +<bookinfo> +<title>LW Tool Chain</title> +<author><firstname>William</firstname><surname>Astle</surname></author> +<copyright><year>2009, 2010</year><holder>William Astle</holder></copyright> +</bookinfo> +<chapter> + +<title>Introduction</title> + +<para> +The LW tool chain provides utilities for building binaries for MC6809 and +HD6309 CPUs. The tool chain includes a cross-assembler and a cross-linker +which support several styles of output. +</para> + +<section> +<title>History</title> +<para> +For a long time, I have had an interest in creating an operating system for +the Coco3. I finally started working on that project around the beginning of +2006. I had a number of assemblers I could choose from. Eventually, I settled +on one and started tinkering. After a while, I realized that assembler was not +going to be sufficient due to lack of macros and issues with forward references. +Then I tried another which handled forward references correctly but still did +not support macros. I looked around at other assemblers and they all lacked +one feature or another that I really wanted for creating my operating system. +</para> + +<para> +The solution seemed clear at that point. I am a fair programmer so I figured +I could write an assembler that would do everything I wanted an assembler to +do. Thus the LWASM probject was born. After more than two years of on and off +work, version 1.0 of LWASM was released in October of 2008. +</para> + +<para> +As the aforementioned operating system project progressed further, it became +clear that while assembling the whole project through a single file was doable, +it was not practical. When I found myself playing some fancy games with macros +in a bid to simulate sections, I realized I needed a means of assembling +source files separately and linking them later. This spawned a major development +effort to add an object file support to LWASM. It also spawned the LWLINK +project to provide a means to actually link the files. +</para> + +</section> + +</chapter> + +<chapter> +<title>Output Formats</title> + +<para> +The LW tool chain supports multiple output formats. Each format has its +advantages and disadvantages. Each format is described below. +</para> + +<section> +<title>Raw Binaries</title> +<para> +A raw binary is simply a string of bytes. There are no headers or other +niceties. Both LWLINK and LWASM support generating raw binaries. ORG directives +in the source code only serve to set the addresses that will be used for +symbols but otherwise have no direct impact on the resulting binary. +</para> + +</section> +<section> +<title>DECB Binaries</title> + +<para>A DECB binary is compatible with the LOADM command in Disk Extended +Color Basic on the CoCo. They are also compatible with CLOADM from Extended +Color Basic. These binaries include the load address of the binary as well +as encoding an execution address. These binaries may contain multiple loadable +sections, each of which has its own load address.</para> + +<para> +Each binary starts with a preamble. Each preamble is five bytes long. The +first byte is zero. The next two bytes specify the number of bytes to load +and the last two bytes specify the address to load the bytes at. Then, a +string of bytes follows. After this string of bytes, there may be another +preamble or a postamble. A postamble is also five bytes in length. The first +byte of the postamble is $FF, the next two are zero, and the last two are +the execution address for the binary. +</para> + +<para> +Both LWASM and LWLINK can output this format. +</para> +</section> + +<section> +<title>OS9 Modules</title> +<para> + +Since version 2.5, LWASM is able to generate OS9 modules. The syntax is +basically the same as for other assemblers. A module starts with the MOD +directive and ends with the EMOD directive. The OS9 directive is provided +as a shortcut for writing system calls. + +</para> + +<para> + +LWASM does NOT provide an OS9Defs file. You must provide your own. Also note +that the common practice of using "ifp1" around the inclusion of the OS9Defs +file is discouraged as it is pointless and can lead to unintentional +problems and phasing errors. Because LWASM reads each file exactly once, +there is no benefit to restricting the inclusion to the first assembly pass. + +</para> + +<para> + +It is also critical to understand that unlike many OS9 assemblers, LWASM +does NOT maintain a separate data address counter. Thus, you must define +all your data offsets and so on outside of the mod/emod segment. It is, +therefore, likely that source code targeted at other assemblers will require +edits to build correctly. + +</para> + +<para> + +LWLINK does not, yet, have the ability to create OS9 modules from object +files. + +</para> +</section> + +<section> +<title>Object Files</title> +<para>LWASM supports generating a proprietary object file format which is +described in <xref linkend="objchap">. LWLINK is then used to link these +object files into a final binary in any of LWLINK's supported binary +formats.</para> + +<para>Object files also support the concept of sections which are not valid +for other output types. This allows related code from each object file +linked to be collapsed together in the final binary.</para> + +<para> +Object files are very flexible in that they allow references that are not +known at assembly time to be resolved at link time. However, because the +addresses of such references are not known at assembly time, there is no way +for the assembler to deduce that an eight bit addressing mode is possible. +That means the assember will default to using sixteen bit addressing +whenever an external or cross-section reference is used. +</para> + +<para> +As of LWASM 2.4, it is possible to force direct page addressing for an +external reference. Care must be taken to ensure the resulting addresses +are really in the direct page since the linker does not know what the direct +page is supposed to be and does not emit errors for byte overflows. +</para> + +<para> +It is also possible to use external references in an eight bit immediate +mode instruction. In this case, only the low order eight bits will be used. +Again, no byte overflows will be flagged. +</para> + + +</section> + +</chapter> + +<chapter> +<title>LWASM</title> +<para> +The LWTOOLS assembler is called LWASM. This chapter documents the various +features of the assembler. It is not, however, a tutorial on 6x09 assembly +language programming. +</para> + +<section> +<title>Command Line Options</title> +<para> +The binary for LWASM is called "lwasm". Note that the binary is in lower +case. lwasm takes the following command line arguments. +</para> + +<variablelist> + +<varlistentry> +<term><option>--6309</option></term> +<term><option>-3</option></term> +<listitem> +<para> +This will cause the assembler to accept the additional instructions available +on the 6309 processor. This is the default mode; this option is provided for +completeness and to override preset command arguments. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--6809</option></term> +<term><option>-9</option></term> +<listitem> +<para> +This will cause the assembler to reject instructions that are only available +on the 6309 processor. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--decb</option></term> +<term><option>-b</option></term> +<listitem> +<para> +Select the DECB output format target. Equivalent to <option>--format=decb</option>. +</para> +<para>While this is the default output format currently, it is not safe to rely +on that fact. Future versions may have different defaults. It is also trivial +to modify the source code to change the default. Thus, it is recommended to specify +this option if you need DECB output. +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--format=type</option></term> +<term><option>-f type</option></term> +<listitem> +<para> +Select the output format. Valid values are <option>obj</option> for the +object file target, <option>decb</option> for the DECB LOADM format, +<option>os9</option> for creating OS9 modules, and <option>raw</option> for +a raw binary. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--list[=file]</option></term> +<term><option>-l[file]</option></term> +<listitem> +<para> +Cause LWASM to generate a listing. If <option>file</option> is specified, +the listing will go to that file. Otherwise it will go to the standard output +stream. By default, no listing is generated. Unless <option>--symbols</option> +is specified, the list will not include the symbol table. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--symbols</option></term> +<term><option>-s</option></term> +<listitem> +<para> +Causes LWASM to generate a list of symbols when generating a listing. +It has no effect unless a listing is being generated. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--obj</option></term> +<listitem> +<para> +Select the proprietary object file format as the output target. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--output=FILE</option></term> +<term><option>-o FILE</option></term> +<listitem> +<para> +This option specifies the name of the output file. If not specified, the +default is <option>a.out</option>. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--pragma=pragma</option></term> +<term><option>-p pragma</option></term> +<listitem> +<para> +Specify assembler pragmas. Multiple pragmas are separated by commas. The +pragmas accepted are the same as for the PRAGMA assembler directive described +below. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--raw</option></term> +<term><option>-r</option></term> +<listitem> +<para> +Select raw binary as the output target. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--includedir=path</option></term> +<term><option>-I path</option></term> +<listitem> +<para> +Add <option>path</option> to the end of the include path. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--help</option></term> +<term><option>-?</option></term> +<listitem> +<para> +Present a help screen describing the command line options. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--usage</option></term> +<listitem> +<para> +Provide a summary of the command line options. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--version</option></term> +<term><option>-V</option></term> +<listitem> +<para> +Display the software version. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--debug</option></term> +<term><option>-d</option></term> +<listitem> +<para> +Increase the debugging level. Only really useful to people hacking on the +LWASM source code itself. +</para> +</listitem> +</varlistentry> + +</variablelist> + +</section> + +<section> +<title>Dialects</title> +<para> +LWASM supports all documented MC6809 instructions as defined by Motorola. +It also supports all known HD6309 instructions. While there is general +agreement on the pneumonics for most of the 6309 instructions, there is some +variance with the block transfer instructions. TFM for all four variations +seems to have gained the most traction and, thus, this is the form that is +recommended for LWASM. However, it also supports COPY, COPY-, IMP, EXP, +TFRP, TFRM, TFRS, and TFRR. It further adds COPY+ as a synomym for COPY, +IMPLODE for IMP, and EXPAND for EXP. +</para> + +<para>By default, LWASM accepts 6309 instructions. However, using the +<parameter>--6809</parameter> parameter, you can cause it to throw errors on +6309 instructions instead.</para> + +<para> +The standard addressing mode specifiers are supported. These are the +hash sign ("#") for immediate mode, the less than sign ("<") for forced +eight bit modes, and the greater than sign (">") for forced sixteen bit modes. +</para> + +<para> +Additionally, LWASM supports using the asterisk ("*") to indicate +base page addressing. This should not be used in hand-written source code, +however, because it is non-standard and may or may not be present in future +versions of LWASM. +</para> + +</section> + +<section> +<title>Source Format</title> + +<para> +LWASM accepts plain text files in a relatively free form. It can handle +lines terminated with CR, LF, CRLF, or LFCR which means it should be able +to assemble files on any platform on which it compiles. +</para> +<para> +Each line may start with a symbol. If a symbol is present, there must not +be any whitespace preceding it. It is legal for a line to contain nothing +but a symbol.</para> +<para> +The op code is separated from the symbol by whitespace. If there is +no symbol, there must be at least one white space character preceding it. +If applicable, the operand follows separated by whitespace. Following the +opcode and operand is an optional comment. +</para> + +<para> It is important to note that operands cannot contain any whitespace +except in the case of delimited strings. This is because the first +whitespace character will be interpreted as the separator between the +operand column and the comment. This behaviour is required for approximate +source compatibility with other 6x09 assemblers. </para> + +<para> +A comment can also be introduced with a * or a ;. The comment character is +optional for end of statement comments. However, if a symbol is the only +thing present on the line other than the comment, the comment character is +mandatory to prevent the assembler from interpreting the comment as an opcode. +</para> + +<para> +For compatibility with the output generated by some C preprocessors, LWASM +will also ignore lines that begin with a #. This should not be used as a general +comment character, however. +</para> + +<para> +The opcode is not treated case sensitively. Neither are register names in +the operand fields. Symbols, however, are case sensitive. +</para> + +<para> As of version 2.6, LWASM supports files with line numbers. If line +numbers are present, the line must start with a digit. The line number +itself must consist only of digits. The line number must then be followed +by either the end of the line or exactly one white space character. After +that white space character, the lines are interpreted exactly as above. +</para> + +</section> + +<section> +<title>Symbols</title> + +<para> +Symbols have no length restriction. They may contain letters, numbers, dots, +dollar signs, and underscores. They must start with a letter, dot, or +underscore. +</para> + +<para> +LWASM also supports the concept of a local symbol. A local symbol is one +which contains either a "?" or a "@", which can appear anywhere in the symbol. +The scope of a local symbol is determined by a number of factors. First, +each included file gets its own local symbol scope. A blank line will also +be considered a local scope barrier. Macros each have their own local symbol +scope as well (which has a side effect that you cannot use a local symbol +as an argument to a macro). There are other factors as well. In general, +a local symbol is restricted to the block of code it is defined within. +</para> + +<para> +By default, unless assembling to the os9 target, a "$" in the symbol will +also make it local. This can be controlled by the "dollarlocal" and +"nodollarlocal" pragmas. In the absence of a pragma to the contrary, for +the os9 target, a "$" in the symbol will not make it considered local while +for all other targets it will. +</para> + +</section> + +<section> +<title>Numbers and Expressions</title> +<para> + +Numbers can be expressed in binary, octal, decimal, or hexadecimal. Binary +numbers may be prefixed with a "%" symbol or suffixed with a "b" or "B". +Octal numbers may be prefixed with "@" or suffixed with "Q", "q", "O", or +"o". Hexadecimal numbers may be prefixed with "$", "0x" or "0X", or suffixed +with "H". No prefix or suffix is required for decimal numbers but they can +be prefixed with "&" if desired. Any constant which begins with a letter +must be expressed with the correct prefix base identifier or be prefixed +with a 0. Thus hexadecimal FF would have to be written either 0FFH or $FF. +Numbers are not case sensitive. + +</para> + +<para> A symbol may appear at any point where a number is acceptable. The +special symbol "*" can be used to represent the starting address of the +current source line within expressions. </para> + +<para>The ASCII value of a character can be included by prefixing it with a +single quote ('). The ASCII values of two characters can be included by +prefixing the characters with a quote (").</para> + +<para> + +LWASM supports the following basic binary operators: +, -, *, /, and %. +These represent addition, subtraction, multiplication, division, and +modulus. It also supports unary negation and unary 1's complement (- and ^ +respectively). It is also possible to use ~ for the unary 1's complement +operator. For completeness, a unary positive (+) is supported though it is +a no-op. LWASM also supports using |, &, and ^ for bitwise or, bitwise and, +and bitwise exclusive or respectively. + +</para> + +<para> + +Operator precedence follows the usual rules. Multiplication, division, and +modulus take precedence over addition and subtraction. Unary operators take +precedence over binary operators. Bitwise operators are lower precdence +than addition and subtraction. To force a specific order of evaluation, +parentheses can be used in the usual manner. + +</para> + +<para> + +As of LWASM 2.5, the operators && and || are recognized for boolean and and +boolean or respectively. They will return either 0 or 1 (false or true). +They have the lowest precedence of all the binary operators. + +</para> + +</section> + +<section> +<title>Assembler Directives</title> +<para> +Various directives can be used to control the behaviour of the +assembler or to include non-code/data in the resulting output. Those directives +that are not described in detail in other sections of this document are +described below. +</para> + +<section> +<title>Data Directives</title> +<variablelist> +<varlistentry><term>FCB <parameter>expr[,...]</parameter></term> +<term>.DB <parameter>expr[,...]</parameter></term> +<term>.BYTE <parameter>expr[,...]</parameter></term> +<listitem> +<para>Include one or more constant bytes (separated by commas) in the output.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>FDB <parameter>expr[,...]</parameter></term> +<term>.DW <parameter>expr[,...]</parameter></term> +<term>.WORD <parameter>expr[,...]</parameter></term> +<listitem> +<para>Include one or more words (separated by commas) in the output.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>FQB <parameter>expr[,...]</parameter></term> +<term>.QUAD <parameter>expr[,...]</parameter></term> +<term>.4BYTE <parameter>expr[,...]</parameter></term> +<listitem> +<para>Include one or more double words (separated by commas) in the output.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>FCC <parameter>string</parameter></term> +<term>.ASCII <parameter>string</parameter></term> +<term>.STR <parameter>string</parameter></term> +<listitem> +<para> +Include a string of text in the output. The first character of the operand +is the delimiter which must appear as the last character and cannot appear +within the string. The string is included with no modifications> +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>FCN <parameter>string</parameter></term> +<term>.ASCIZ <parameter>string</parameter></term> +<term>.STRZ <parameter>string</parameter></term> +<listitem> +<para> +Include a NUL terminated string of text in the output. The first character of +the operand is the delimiter which must appear as the last character and +cannot appear within the string. A NUL byte is automatically appended to +the string. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>FCS <parameter>string</parameter></term> +<term>.ASCIS <parameter>string</parameter></term> +<term>.STRS <parameter>string</parameter></term> +<listitem> +<para> +Include a string of text in the output with bit 7 of the final byte set. The +first character of the operand is the delimiter which must appear as the last +character and cannot appear within the string. +</para> +</listitem> +</varlistentry> + +<varlistentry><term>ZMB <parameter>expr</parameter></term> +<listitem> +<para> +Include a number of NUL bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted. +</para> +</listitem> +</varlistentry> + +<varlistentry><term>ZMD <parameter>expr</parameter></term> +<listitem> +<para> +Include a number of zero words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. +</para> +</listitem> +</varlistentry> + +<varlistentry><term>ZMQ <parameter>expr<parameter></term> +<listitem> +<para> +Include a number of zero double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>RMB <parameter>expr</parameter></term> +<term>.BLKB <parameter>expr</parameter></term> +<term>.DS <parameter>expr</parameter></term> +<term>.RS <parameter>expr</parameter></term> +<listitem> +<para> +Reserve a number of bytes in the output. The number must be fully resolvable +during pass 1 of assembly so no forward or external references are permitted. +The value of the bytes is undefined. +</para> +</listitem> +</varlistentry> + +<varlistentry><term>RMD <parameter>expr</parameter></term> +<listitem> +<para> +Reserve a number of words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the words is undefined. +</para> +</listitem> +</varlistentry> + +<varlistentry><term>RMQ <parameter>expr</parameter></term> +<listitem> +<para> +Reserve a number of double-words in the output. The number must be fully +resolvable during pass 1 of assembly so no forward or external references are +permitted. The value of the double-words is undefined. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>INCLUDEBIN <parameter>filename</parameter></term> +<listitem> +<para> +Treat the contents of <parameter>filename</parameter> as a string of bytes to +be included literally at the current assembly point. This has the same effect +as converting the file contents to a series of FCB statements and including +those at the current assembly point. +</para> + +<para> If <parameter>filename</parameter> beings with a /, the file name +will be taken as absolute. Otherwise, the current directory will be +searched followed by the search path in the order specified.</para> + +<para> Please note that absolute path detection including drive letters will +not function correctly on Windows platforms. Non-absolute inclusion will +work, however.</para> + +</listitem> +</varlistentry> + +</variablelist> + +</section> + +<section> +<title>Address Definition</title> +<para>The directives in this section all control the addresses of symbols +or the assembly process itself.</para> + +<variablelist> +<varlistentry><term>ORG <parameter>expr</parameter></term> +<listitem> +<para>Set the assembly address. The address must be fully resolvable on the +first pass so no external or forward references are permitted. ORG is not +permitted within sections when outputting to object files. For the DECB +target, each ORG directive after which output is generated will cause +a new preamble to be output. ORG is only used to determine the addresses +of symbols when the raw target is used. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><parameter>sym</parameter> EQU <parameter>expr</parameter></term> +<term><parameter>sym</parameter> = <parameter>expr</parameter></term> +<listitem> +<para>Define the value of <parameter>sym</parameter> to be <parameter>expr</parameter>. +</listitem> +</varlistentry> + +<varlistentry> +<term><parameter>sym</parameter> SET <parameter>expr</parameter></term> +<listitem> +<para>Define the value of <parameter>sym</parameter> to be <parameter>expr</parameter>. +Unlike EQU, SET permits symbols to be defined multiple times as long as SET +is used for all instances. Use of the symbol before the first SET statement +that sets its value is undefined.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>SETDP <parameter>expr</parameter></term> +<listitem> +<para>Inform the assembler that it can assume the DP register contains +<parameter>expr</parameter>. This directive is only advice to the assembler +to determine whether an address is in the direct page and has no effect +on the contents of the DP register. The value must be fully resolved during +the first assembly pass because it affects the sizes of subsequent instructions. +</para> +<para>This directive has no effect in the object file target. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>ALIGN <parameter>expr</parameter>[,<parameter>value</parameter>]</term> +<listitem> + +<para>Force the current assembly address to be a multiple of +<parameter>expr</parameter>. If <parameter>value</parameter> is not +specified, a series of NUL bytes is output to force the alignment, if +required. Otherwise, the low order 8 bits of <parameter>value</parameter> +will be used as the fill. The alignment value must be fully resolved on the +first pass because it affects the addresses of subsquent instructions. +However, <parameter>value</parameter> may include forward references; as +long as it resolves to a constant for the second pass, the value will be +accepted.</para> + +<para>Unless <parameter>value</parameter> is specified as something like $12, +this directive is not suitable for inclusion in the middle of actual code. +The default padding value is $00 which is intended to be used within data +blocks. </para> + +</listitem> +</varlistentry> + +</variablelist> + +</section> + +<section> +<title>Conditional Assembly</title> +<para> +Portions of the source code can be excluded or included based on conditions +known at assembly time. Conditionals can be nested arbitrarily deeply. The +directives associated with conditional assembly are described in this section. +</para> +<para>All conditionals must be fully bracketed. That is, every conditional +statement must eventually be followed by an ENDC at the same level of nesting. +</para> +<para>Conditional expressions are only evaluated on the first assembly pass. +It is not possible to game the assembly process by having a conditional +change its value between assembly passes. Due to the underlying architecture +of LWASM, there is no possible utility to IFP1 and IFP2, nor can they, as of LWASM 3.0, actually +be implemented meaningfully. Thus there is not and never will +be any equivalent of IFP1 or IFP2 as provided by other assemblers. Use of those opcodes +will throw a warning and be ignored.</para> + +<para>It is important to note that if a conditional does not resolve to a constant +during the first parsing pass, an error will be thrown. This is unavoidable because the assembler +must make a decision about which source to include and which source to exclude at this stage. +Thus, expressions that work normally elsewhere will not work for conditions.</para> + +<variablelist> +<varlistentry> +<term>IFEQ <parameter>expr</parameter></term> +<listitem> +<para>If <parameter>expr</parameter> evaluates to zero, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFNE <parameter>expr</parameter></term> +<term>IF <parameter>expr</parameter></term> +<listitem> +<para>If <parameter>expr</parameter> evaluates to a non-zero value, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFGT <parameter>expr</parameter></term> +<listitem> +<para>If <parameter>expr</parameter> evaluates to a value greater than zero, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFGE <parameter>expr</parameter></term> +<listitem> +<para>If <parameter>expr</parameter> evaluates to a value greater than or equal to zero, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFLT <parameter>expr</parameter></term> +<listitem> +<para>If <parameter>expr</parameter> evaluates to a value less than zero, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFLE <parameter>expr</parameter></term> +<listitem> +<para>If <parameter>expr</parameter> evaluates to a value less than or equal to zero , the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFDEF <parameter>sym</parameter></term> +<listitem> +<para>If <parameter>sym</parameter> is defined at this point in the assembly +process, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>IFNDEF <parameter>sym</parameter></term> +<listitem> +<para>If <parameter>sym</parameter> is not defined at this point in the assembly +process, the conditional +will be considered true. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>ELSE</term> +<listitem> +<para> +If the preceding conditional at the same level of nesting was false, the +statements following will be assembled. If the preceding conditional at +the same level was true, the statements following will not be assembled. +Note that the preceding conditional might have been another ELSE statement +although this behaviour is not guaranteed to be supported in future versions +of LWASM. +</para> +</listitem> + +<varlistentry> +<term>ENDC</term> +<listitem> +<para> +This directive marks the end of a conditional construct. Every conditional +construct must end with an ENDC directive. +</para> +</listitem> +</varlistentry> + +</variablelist> +</section> + +<section> +<title>OS9 Target Directives</title> + +<para>This section includes directives that apply solely to the OS9 +target.</para> + +<variablelist> + +<varlistentry> +<term>OS9 <parameter>syscall</parameter></term> +<listitem> +<para> + +This directive generates a call to the specified system call. <parameter>syscall</parameter> may be an arbitrary expression. + +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>MOD <parameter>size</parameter>,<parameter>name</parameter>,<parameter>type</parameter>,<parameter>flags</parameter>,<parameter>execoff</parameter>,<parameter>datasize</parameter></term> +<listitem> +<para> + +This tells LWASM that the beginning of the actual module is here. It will +generate a module header based on the parameters specified. It will also +begin calcuating the module CRC. + +</para> + +<para> + +The precise meaning of the various parameters is beyond the scope of this +document since it is not a tutorial on OS9 module programming. + +</para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>EMOD</term> +<listitem> +<para> + +This marks the end of a module and causes LWASM to emit the calculated CRC +for the module. + +</para> +</varlistentry> + +</variablelist> +</section> + +<section> +<title>Miscelaneous Directives</title> + +<para>This section includes directives that do not fit into the other +categories.</para> + +<variablelist> + +<varlistentry> +<term>INCLUDE <parameter>filename</parameter></term> +<term>USE <parameter>filename</parameter></term> + +<listitem> <para> Include the contents of <parameter>filename</parameter> at +this point in the assembly as though it were a part of the file currently +being processed. Note that if whitespace appears in the name of the file, +you must enclose <parameter>filename</parameter> in quotes. +</para> + +<para> +Note that the USE variation is provided only for compatibility with other +assemblers. It is recommended to use the INCLUDE variation.</para> + +<para>If <parameter>filename</parameter> begins with a "/", it is +interpreted as an absolute path. If it does not, the search path will be used +to find the file. First, the directory containing the file that contains this +directive. (Includes within an included file are relative to the included file, +not the file that included it.) If the file is not found there, the include path +is searched. If it is still not found, an error will be thrown. Note that the +current directory as understood by your shell or operating system is not searched. +</para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>END <parameter>[expr]</parameter></term> +<listitem> +<para> +This directive causes the assembler to stop assembling immediately as though +it ran out of input. For the DECB target only, <parameter>expr</parameter> +can be used to set the execution address of the resulting binary. For all +other targets, specifying <parameter>expr</parameter> will cause an error. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>ERROR <parameter>string</parameter></term> +<listitem> +<para> +Causes a custom error message to be printed at this line. This will cause +assembly to fail. This directive is most useful inside conditional constructs +to cause assembly to fail if some condition that is known bad happens. Everything +from the directive to the end of the line is considered the error message. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>WARNING <parameter>string</parameter></term> +<listitem> +<para> +Causes a custom warning message to be printed at this line. This will not cause +assembly to fail. This directive is most useful inside conditional constructs +or include files to alert the programmer to a deprecated feature being used +or some other condition that may cause trouble later, but which may, in fact, +not cause any trouble. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>.MODULE <parameter>string</parameter></term> +<listitem> +<para> +This directive is ignored for most output targets. If the output target +supports encoding a module name into it, <parameter>string</parameter> +will be used as the module name. +</para> +<para> +As of version 3.0, no supported output targets support this directive. +</para> +</listitem> +</varlistentry> + +</variablelist> +</section> + +</section> + +<section> +<title>Macros</title> +<para> +LWASM is a macro assembler. A macro is simply a name that stands in for a +series of instructions. Once a macro is defined, it is used like any other +assembler directive. Defining a macro can be considered equivalent to adding +additional assembler directives. +</para> +<para>Macros may accept parameters. These parameters are referenced within +a macro by the a backslash ("\") followed by a digit 1 through 9 for the first +through ninth parameters. They may also be referenced by enclosing the +decimal parameter number in braces ("{num}"). These parameter references +are replaced with the verbatim text of the parameter passed to the macro. A +reference to a non-existent parameter will be replaced by an empty string. +Macro parameters are expanded everywhere on each source line. That means +the parameter to a macro could be used as a symbol or it could even appear +in a comment or could cause an entire source line to be commented out +when the macro is expanded. +</para> +<para> +Parameters passed to a macro are separated by commas and the parameter list +is terminated by any whitespace. This means that neither a comma nor whitespace +may be included in a macro parameter. +</para> +<para> +Macro expansion is done recursively. That is, within a macro, macros are +expanded. This can lead to infinite loops in macro expansion. If the assembler +hangs for a long time while assembling a file that uses macros, this may be +the reason.</para> + +<para>Each macro expansion receives its own local symbol context which is not +inherited by any macros called by it nor is it inherited from the context +the macro was instantiated in. That means it is possible to use local symbols +within macros without having them collide with symbols in other macros or +outside the macro itself. However, this also means that using a local symbol +as a parameter to a macro, while legal, will not do what it would seem to do +as it will result in looking up the local symbol in the macro's symbol context +rather than the enclosing context where it came from, likely yielding either +an undefined symbol error or bizarre assembly results. +</para> +<para> +Note that there is no way to define a macro as local to a symbol context. All +macros are part of the global macro namespace. However, macros have a separate +namespace from symbols so it is possible to have a symbol with the same name +as a macro. +</para> + +<para> +Macros are defined only during the first pass. Macro expansion also +only occurs during the first pass. On the second pass, the macro +definition is simply ignored. Macros must be defined before they are used. +</para> + +<para>The following directives are used when defining macros.</para> + +<variablelist> +<varlistentry> +<term><parameter>macroname</parameter> MACRO</term> +<listitem> +<para>This directive is used to being the definition of a macro called +<parameter>macroname</parameter>. If <parameter>macroname</parameter> already +exists, it is considered an error. Attempting to define a macro within a +macro is undefined. It may work and it may not so the behaviour should not +be relied upon. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>ENDM</term> +<listitem> +<para> +This directive indicates the end of the macro currently being defined. It +causes the assembler to resume interpreting source lines as normal. +</para> +</listitem> +</variablelist> + +</section> + +<section> +<title>Structures</title> +<para> + +Structures are used to group related data in a fixed structure. A structure +consists a number of fields, defined in sequential order and which take up +specified size. The assembler does not enforce any means of access within a +structure; it assumes that whatever you are doing, you intended to do. +There are two pseudo ops that are used for defining structures. + +</para> + +<variablelist> +<varlistentry> +<term><parameter>structname</parameter> STRUCT</term> +<listitem> +<para> + +This directive is used to begin the definition of a structure with name +<parameter>structname</parameter>. Subsequent statements all form part of +the structure definition until the end of the structure is declared. + +</para> +</listitem> +</varlistentry> +<varlistentry> +<term>ENDSTRUCT</term> +<term>ENDS</term> +<listitem> +<para> +This directive ends the definition of the structure. ENDSTRUCT is the +preferred form. Prior to version 3.0 of LWASM, ENDS was used to end a +section instead of a structure. +</para> +</listitem> +</varlistentry> +</variablelist> + +<para> + +Within a structure definition, only reservation pseudo ops are permitted. +Anything else will cause an assembly error. +</para> + +<para> Once a structure is defined, you can reserve an area of memory in the +same structure by using the structure name as the opcode. Structures can +also contain fields that are themselves structures. See the example +below.</para> + +<programlisting> +tstruct2 STRUCT +f1 rmb 1 +f2 rmb 1 + ENDSTRUCT + +tstruct STRUCT +field1 rmb 2 +field2 rmb 3 +field3 tstruct2 + ENDSTRUCT + + ORG $2000 +var1 tstruct +var2 tstruct2 +</programlisting> + +<para>Fields are referenced using a dot (.) as a separator. To refer to the +generic offset within a structure, use the structure name to the left of the +dot. If referring to a field within an actual variable, use the variable's +symbol name to the left of the dot.</para> + +<para>You can also refer to the actual size of a structure (or a variable +declared as a structure) using the special symbol sizeof{structname} where +structname will be the name of the structure or the name of the +variable.</para> + +<para>Essentially, structures are a shortcut for defining a vast number of +symbols. When a structure is defined, the assembler creates symbols for the +various fields in the form structname.fieldname as well as the appropriate +sizeof{structname} symbol. When a variable is declared as a structure, the +assembler does the same thing using the name of the variable. You will see +these symbols in the symbol table when the assembler is instructed to +provide a listing. For instance, the above listing will create the +following symbols (symbol values in parentheses): tstruct2.f1 (0), +tstruct2.f2 (1), sizeof{tstruct2} (2), tstruct.field1 (0), tstruct.field2 +(2), tstruct.field3 (5), tstruct.field3.f1 (5), tstruct.field3.f2 (6), +sizeof{tstruct.field3} (2), sizeof{tstruct} (7), var1 {$2000}, var1.field1 +{$2000}, var1.field2 {$2002}, var1.field3 {$2005}, var1.field3.f1 {$2005}, +var1.field3.f2 {$2006}, sizeof(var1.field3} (2), sizeof{var1} (7), var2 +($2007), var2.f1 ($2007), var2.f2 ($2008), sizeof{var2} (2). </para> + +</section> + +<section> +<title>Object Files and Sections</title> +<para> +The object file target is very useful for large project because it allows +multiple files to be assembled independently and then linked into the final +binary at a later time. It allows only the small portion of the project +that was modified to be re-assembled rather than requiring the entire set +of source code to be available to the assembler in a single assembly process. +This can be particularly important if there are a large number of macros, +symbol definitions, or other metadata that uses resources at assembly time. +By far the largest benefit, however, is keeping the source files small enough +for a mere mortal to find things in them. +</para> + +<para> +With multi-file projects, there needs to be a means of resolving references to +symbols in other source files. These are known as external references. The +addresses of these symbols cannot be known until the linker joins all the +object files into a single binary. This means that the assembler must be +able to output the object code without knowing the value of the symbol. This +places some restrictions on the code generated by the assembler. For +example, the assembler cannot generate direct page addressing for instructions +that reference external symbols because the address of the symbol may not +be in the direct page. Similarly, relative branches and PC relative addressing +cannot be used in their eight bit forms. Everything that must be resolved +by the linker must be assembled to use the largest address size possible to +allow the linker to fill in the correct value at link time. Note that the +same problem applies to absolute address references as well, even those in +the same source file, because the address is not known until link time. +</para> + +<para> +It is often desired in multi-file projects to have code of various types grouped +together in the final binary generated by the linker as well. The same applies +to data. In order for the linker to do that, the bits that are to be grouped +must be tagged in some manner. This is where the concept of sections comes in. +Each chunk of code or data is part of a section in the object file. Then, +when the linker reads all the object files, it coalesces all sections of the +same name into a single section and then considers it as a unit. +</para> + +<para> +The existence of sections, however, raises a problem for symbols even +within the same source file. Thus, the assembler must treat symbols from +different sections within the same source file in the same manner as external +symbols. That is, it must leave them for the linker to resolve at link time, +with all the limitations that entails. +</para> + +<para> +In the object file target mode, LWASM requires all source lines that +cause bytes to be output to be inside a section. Any directives that do +not cause any bytes to be output can appear outside of a section. This includes +such things as EQU or RMB. Even ORG can appear outside a section. ORG, however, +makes no sense within a section because it is the linker that determines +the starting address of the section's code, not the assembler. +</para> + +<para> +All symbols defined globally in the assembly process are local to the +source file and cannot be exported. All symbols defined within a section are +considered local to the source file unless otherwise explicitly exported. +Symbols referenced from external source files must be declared external, +either explicitly or by asking the assembler to assume that all undefined +symbols are external. +</para> + +<para> +It is often handy to define a number of memory addresses that will be +used for data at run-time but which need not be included in the binary file. +These memory addresses are not initialized until run-time, either by the +program itself or by the program loader, depending on the operating environment. +Such sections are often known as BSS sections. LWASM supports generating +sections with a BSS attribute set which causes the section definition including +symbols exported from that section and those symbols required to resolve +references from the local file, but with no actual code in the object file. +It is illegal for any source lines within a BSS flagged section to cause any +bytes to be output. +</para> + +<para>The following directives apply to section handling.</para> + +<variablelist> +<varlistentry> +<term>SECTION <parameter>name[,flags]</parameter></term> +<term>SECT <parameter>name[,flags]</parameter></term> +<term>.AREA <parameter>name[,flags]</parameter></term> +<listitem> +<para> +Instructs the assembler that the code following this directive is to be +considered part of the section <parameter>name</parameter>. A section name +may appear multiple times in which case it is as though all the code from +all the instances of that section appeared adjacent within the source file. +However, <parameter>flags</parameter> may only be specified on the first +instance of the section. +</para> +<para>There is a single flag supported in <parameter>flags</parameter>. The +flag <parameter>bss</parameter> will cause the section to be treated as a BSS +section and, thus, no code will be included in the object file nor will any +bytes be permitted to be output.</para> +<para> +If the section name is "bss" or ".bss" in any combination of upper and +lower case, the section is assumed to be a BSS section. In that case, +the flag <parameter>!bss</parameter> can be used to override this assumption. +</para> +<para> +If assembly is already happening within a section, the section is implicitly +ended and the new section started. This is not considered an error although +it is recommended that all sections be explicitly closed. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>ENDSECTION</term> +<term>ENDSECT</term> +<listitem> +<para> +This directive ends the current section. This puts assembly outside of any +sections until the next SECTION directive. ENDSECTION is the preferred form. +Prior to version 3.0 of LWASM, ENDS could also be used to end a section but +as of version 3.0, it is now an alias for ENDSTRUCT instead. +</listitem> +</varlistentry> + +<varlistentry> +<term><parameter>sym</parameter> EXTERN</term> +<term><parameter>sym</parameter> EXTERNAL</term> +<term><parameter>sym</parameter> IMPORT</term> +<listitem> +<para> +This directive defines <parameter>sym</parameter> as an external symbol. +This directive may occur at any point in the source code. EXTERN definitions +are resolved on the first pass so an EXTERN definition anywhere in the +source file is valid for the entire file. The use of this directive is +optional when the assembler is instructed to assume that all undefined +symbols are external. In fact, in that mode, if the symbol is referenced +before the EXTERN directive, an error will occur. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><parameter>sym</parameter> EXPORT</term> +<term><parameter>sym</parameter> .GLOBL</term> + +<term>EXPORT <parameter>sym</parameter></term> +<term>.GLOBL <parameter>sym</parameter></term> + +<listitem> +<para> +This directive defines <parameter>sym</parameter> as an exported symbol. +This directive may occur at any point in the source code, even before the +definition of the exported symbol. +</para> +<para> +Note that <parameter>sym</parameter> may appear as the operand or as the +statement's symbol. If there is a symbol on the statement, that will +take precedence over any operand that is present. +</para> +</listitem> + +</varlistentry> + +<varlistentry> +<term><parameter>sym</parameter> EXTDEP</term> +<listitem> + +<para>This directive forces an external dependency on +<parameter>sym</parameter>, even if it is never referenced anywhere else in +this file.</para> + +</listitem> +</varlistentry> +</variablelist> + +</section> + +<section> +<title>Assembler Modes and Pragmas</title> +<para> +There are a number of options that affect the way assembly is performed. +Some of these options can only be specified on the command line because +they determine something absolute about the assembly process. These include +such things as the output target. Other things may be switchable during +the assembly process. These are known as pragmas and are, by definition, +not portable between assemblers. +</para> + +<para>LWASM supports a number of pragmas that affect code generation or +otherwise affect the behaviour of the assembler. These may be specified by +way of a command line option or by assembler directives. The directives +are as follows. +</para> + +<variablelist> +<varlistentry> +<term>PRAGMA <parameter>pragma[,...]</parameter></term> +<listitem> +<para> +Specifies that the assembler should bring into force all <parameter>pragma</parameter>s +specified. Any unrecognized pragma will cause an assembly error. The new +pragmas will take effect immediately. This directive should be used when +the program will assemble incorrectly if the pragma is ignored or not supported. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>*PRAGMA <parameter>pragma[,...]</parameter></term> +<listitem> +<para> +This is identical to the PRAGMA directive except no error will occur with +unrecognized or unsupported pragmas. This directive, by virtue of starting +with a comment character, will also be ignored by assemblers that do not +support this directive. Use this variation if the pragma is not required +for correct functioning of the code. +</para> +</listitem> +</varlistentry> +</variablelist> + +<para>Each pragma supported has a positive version and a negative version. +The positive version enables the pragma while the negative version disables +it. The negatitve version is simply the positive version with "no" prefixed +to it. For instance, "pragma" vs. "nopragma". Only the positive version is +listed below.</para> + +<para>Pragmas are not case sensitive.</para> + +<variablelist> +<varlistentry> +<term>index0tonone</term> +<listitem> +<para> +When in force, this pragma enables an optimization affecting indexed addressing +modes. When the offset expression in an indexed mode evaluates to zero but is +not explicity written as 0, this will replace the operand with the equivalent +no offset mode, thus creating slightly faster code. Because of the advantages +of this optimization, it is enabled by default. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>cescapes</term> +<listitem> +<para> +This pragma will cause strings in the FCC, FCS, and FCN pseudo operations to +have C-style escape sequences interpreted. The one departure from the official +spec is that unrecognized escape sequences will return either the character +immediately following the backslash or some undefined value. Do not rely +on the behaviour of undefined escape sequences. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>importundefexport</term> +<listitem> +<para> +This pragma is only valid for targets that support external references. When +in force, it will cause the EXPORT directive to act as IMPORT if the symbol +to be exported is not defined. This is provided for compatibility with the +output of gcc6809 and should not be used in hand written code. Because of +the confusion this pragma can cause, it is disabled by default. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>undefextern</term> +<listitem> +<para> +This pragma is only valid for targets that support external references. When in +force, if the assembler sees an undefined symbol on the second pass, it will +automatically define it as an external symbol. This automatic definition will +apply for the remainder of the assembly process, even if the pragma is +subsequently turned off. Because this behaviour would be potentially surprising, +this pragma defaults to off. +</para> +<para> +The primary use for this pragma is for projects that share a large number of +symbols between source files. In such cases, it is impractical to enumerate +all the external references in every source file. This allows the assembler +and linker to do the heavy lifting while not preventing a particular source +module from defining a local symbol of the same name as an external symbol +if it does not need the external symbol. (This pragma will not cause an +automatic external definition if there is already a locally defined symbol.) +</para> +<para> +This pragma will often be specified on the command line for large projects. +However, depending on the specific dynamics of the project, it may be sufficient +for one or two files to use this pragma internally. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>dollarlocal</term> +<listitem> + +<para>When set, a "$" in a symbol makes it local. When not set, "$" does not +cause a symbol to be local. It is set by default except when using the OS9 +target.</para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>dollarnotlocal</term> +<listitem> + +<para> This is the same as the "dollarlocal" pragma except its sense is +reversed. That is, "dollarlocal" and "nodollarnotlocal" are equivalent and +"nodollarlocal" and "dollarnotlocal" are equivalent. </para> + +</listitem> +</varlistentry> + +<varlistentry> +<term>pcaspcr</term> +<listitem> + +<para> Normally, LWASM makes a distinction between PC and PCR in program +counter relative addressing. In particular, the use of PC means an absolute +offset from PC while PCR causes the assembler to calculate the offset to the +specified operand and use that as the offset from PC. By setting this +pragma, you can have PC treated the same as PCR. </para> + + +</listitem> +</varlistentry> + +</variablelist> + +</section> + +</chapter> + +<chapter> +<title>LWLINK</title> +<para> +The LWTOOLS linker is called LWLINK. This chapter documents the various features +of the linker. +</para> + +<section> +<title>Command Line Options</title> +<para> +The binary for LWLINK is called "lwlink". Note that the binary is in lower +case. lwlink takes the following command line arguments. +</para> +<variablelist> +<varlistentry> +<term><option>--decb</option></term> +<term><option>-b</option></term> +<listitem> +<para> +Selects the DECB output format target. This is equivalent to <option>--format=decb</option> +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--output=FILE</option></term> +<term><option>-o FILE</option></term> +<listitem> +<para> +This option specifies the name of the output file. If not specified, the +default is <option>a.out</option>. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--format=TYPE</option></term> +<term><option>-f TYPE</option></term> +<listitem> +<para> +This option specifies the output format. Valid values are <option>decb</option> +and <option>raw</option> +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--raw</option></term> +<term><option>-r</option></term> +<listitem> +<para> +This option specifies the raw output format. +It is equivalent to <option>--format=raw</option> +and <option>-f raw</option> +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--script=FILE</option></term> +<term><option>-s</option></term> +<listitem> +<para> +This option allows specifying a linking script to override the linker's +built in defaults. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--section-base=SECT=BASE</option></term> +<listitem> +<para> +Cause section SECT to load at base address BASE. This will be prepended +to the built-in link script. It is ignored if a link script is provided. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--map=FILE</option></term> +<term><option>-m FILE</option></term> +<listitem> +<para> +This will output a description of the link result to FILE. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--library=LIBSPEC</option></term> +<term><option>-l LIBSPEC</option></term> +<listitem> +<para> +Load a library using the library search path. LIBSPEC will have "lib" prepended +and ".a" appended. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--library-path=DIR</option></term> +<term><option>-L DIR</option></term> +<listitem> +<para> +Add DIR to the library search path. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--debug</option></term> +<term><option>-d</option></term> +<listitem> +<para> +This option increases the debugging level. It is only useful for LWTOOLS +developers. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--help</option></term> +<term><option>-?</option></term> +<listitem> +<para> +This provides a listing of command line options and a brief description +of each. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--usage</option></term> +<listitem> +<para> +This will display a usage summary +of each command line option. +</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term><option>--version</option></term> +<term><option>-V</option></term> +<listitem> +<para> +This will display the version of LWLINK. +</para> +</listitem> +</varlistentry> + +</section> + +<section> +<title>Linker Operation</title> + +<para> + +LWLINK takes one or more files in supported input formats and links them +into a single binary. Currently supported formats are the LWTOOLS object +file format and the archive format used by LWAR. While the precise method is +slightly different, linking can be conceptualized as the following steps. + +</para> + +<orderedlist> +<listitem> +<para> +First, the linker loads a linking script. If no script is specified, it +loads a built-in default script based on the output format selected. This +script tells the linker how to lay out the various sections in the final +binary. +</para> +</listitem> + +<listitem> +<para> +Next, the linker reads all the input files into memory. At this time, it +flags any format errors in those files. It constructs a table of symbols +for each object at this time. +</para> +</listitem> + +<listitem> +<para> +The linker then proceeds with organizing the sections loaded from each file +according to the linking script. As it does so, it is able to assign addresses +to each symbol defined in each object file. At this time, the linker may +also collapse different instances of the same section name into a single +section by appending the data from each subsequent instance of the section +to the first instance of the section. +</para> +</listitem> + +<listitem> +<para> +Next, the linker looks through every object file for every incomplete reference. +It then attempts to fully resolve that reference. If it cannot do so, it +throws an error. Once a reference is resolved, the value is placed into +the binary code at the specified section. It should be noted that an +incomplete reference can reference either a symbol internal to the object +file or an external symbol which is in the export list of another object +file. +</para> +</listitem> + +<listitem> +<para> +If all of the above steps are successful, the linker opens the output file +and actually constructs the binary. +</para> +</listitem> +</orderedlist> + +</section> + +<section +<title>Linking Scripts</title> +<para> +A linker script is used to instruct the linker about how to assemble the +various sections into a completed binary. It consists of a series of +directives which are considered in the order they are encountered. +</para> +<para> +The sections will appear in the resulting binary in the order they are +specified in the script file. If a referenced section is not found, the linker will behave as though the +section did exist but had a zero size, no relocations, and no exports. +A section should only be referenced once. Any subsequent references will have +an undefined effect. +</para> + +<para> +All numbers are in linking scripts are specified in hexadecimal. All directives +are case sensitive although the hexadecimal numbers are not. +</para> + +<para>A section name can be specified as a "*", then any section not +already matched by the script will be matched. The "*" can be followed +by a comma and a flag to narrow the section down slightly, also. +If the flag is "!bss", then any section that is not flagged as a bss section +will be matched. If the flag is "bss", then any section that is flagged as +bss will be matched. +</para> + +<para>The following directives are understood in a linker script.</para> +<variablelist> +<varlistentry> +<term>section <parameter>name</parameter> load <parameter>addr</parameter></term> +<listitem><para> + +This causes the section <parameter>name</parameter> to load at +<parameter>addr</parameter>. For the raw target, only one "load at" entry is +allowed for non-bss sections and it must be the first one. For raw targets, +it affects the addresses the linker assigns to symbols but has no other +affect on the output. bss sections may all have separate load addresses but +since they will not appear in the binary anyway, this is okay. +</para><para> +For the decb target, each "load" entry will cause a new "block" to be +output to the binary which will contain the load address. It is legal for +sections to overlap in this manner - the linker assumes the loader will sort +everything out. +</para></listitem> +</varlistentry> + +<varlistentry> +<term>section <parameter>name</parameter></term> +<listitem><para> + +This will cause the section <parameter>name</parameter> to load after the previously listed +section. +</para></listitem></varlistentry> +<varlistentry> +<term>exec <parameter>addr or sym</parameter></term> +<listitem> +<para> +This will cause the execution address (entry point) to be the address +specified (in hex) or the specified symbol name. The symbol name must +match a symbol that is exported by one of the object files being linked. +This has no effect for targets that do not encode the entry point into the +resulting file. If not specified, the entry point is assumed to be address 0 +which is probably not what you want. The default link scripts for targets +that support this directive automatically starts at the beginning of the +first section (usually "init" or "code") that is emitted in the binary. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term>pad <parameter>size</parameter></term> +<listitem><para> +This will cause the output file to be padded with NUL bytes to be exactly +<parameter>size</parameter> bytes in length. This only makes sense for a raw target. +</para> +</listitem> +</varlistentry> +</variablelist> + + + +</section> + +</chapter> + +<chapter> +<title>Libraries and LWAR</title> + +<para> +LWTOOLS also includes a tool for managing libraries. These are analogous to +the static libraries created with the "ar" tool on POSIX systems. Each library +file contains one or more object files. The linker will treat the object +files within a library as though they had been specified individually on +the command line except when resolving external references. External references +are looked up first within the object files within the library and then, if +not found, the usual lookup based on the order the files are specified on +the command line occurs. +</para> + +<para> +The tool for creating these libary files is called LWAR. +</para> + +<section> +<title>Command Line Options</title> +<para> +The binary for LWAR is called "lwar". Note that the binary is in lower +case. The options lwar understands are listed below. For archive manipulation +options, the first non-option argument is the name of the archive. All other +non-option arguments are the names of files to operate on. +</para> + +<variablelist> +<varlistentry> +<term><option>--add</option></term> +<term><option>-a</option></term> +<listitem> +<para> +This option specifies that an archive is going to have files added to it. +If the archive does not already exist, it is created. New files are added +to the end of the archive. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--create</option></term> +<term><option>-c</option></term> +<listitem> +<para> +This option specifies that an archive is going to be created and have files +added to it. If the archive already exists, it is truncated. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--merge</option></term> +<term><option>-m</option></term> +<listitem> +<para> +If specified, any files specified to be added to an archive will be checked +to see if they are archives themselves. If so, their constituent members are +added to the archive. This is useful for avoiding archives containing archives. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--list</option></term> +<term><option>-l</option></term> +<listitem> +<para> +This will display a list of the files contained in the archive. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--debug</option></term> +<term><option>-d</option></term> +<listitem> +<para> +This option increases the debugging level. It is only useful for LWTOOLS +developers. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--help</option></term> +<term><option>-?</option></term> +<listitem> +<para> +This provides a listing of command line options and a brief description +of each. +</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><option>--usage</option></term> +<listitem> +<para> +This will display a usage summary +of each command line option. +</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term><option>--version</option></term> +<term><option>-V</option></term> +<listitem> +<para> +This will display the version of LWLINK. +of each. +</para> +</listitem> +</varlistentry> + +</section> + +</chapter> + +<chapter id="objchap"> +<title>Object Files</title> +<para> +LWTOOLS uses a proprietary object file format. It is proprietary in the sense +that it is specific to LWTOOLS, not that it is a hidden format. It would be +hard to keep it hidden in an open source tool chain anyway. This chapter +documents the object file format. +</para> + +<para> +An object file consists of a series of sections each of which contains a +list of exported symbols, a list of incomplete references, and a list of +"local" symbols which may be used in calculating incomplete references. Each +section will obviously also contain the object code. +</para> + +<para> +Exported symbols must be completely resolved to an address within the +section it is exported from. That is, an exported symbol must be a constant +rather than defined in terms of other symbols.</para> + +<para> +Each object file starts with a magic number and version number. The magic +number is the string "LWOBJ16" for this 16 bit object file format. The only +defined version number is currently 0. Thus, the first 8 bytes of the object +file are <code>4C574F424A313600</code> +</para> + +<para> +Each section has the following items in order: +</para> + +<itemizedlist> +<listitem><para>section name</para></listitem> +<listitem><para>flags</para></listitem> +<listitem><para>list of local symbols (and addresses within the section)</para></listitem> +<listitem><para>list of exported symbols (and addresses within the section)</para></listitem> +<listitem><para>list of incomplete references along with the expressions to calculate them</para></listitem> +<listitem><para>the actual object code (for non-BSS sections)</para></listitem> +</itemizedlist> + +<para> +The section starts with the name of the section with a NUL termination +followed by a series of flag bytes terminated by NUL. There are only two +flag bytes defined. A NUL (0) indicates no more flags and a value of 1 +indicates the section is a BSS section. For a BSS section, no actual +code is included in the object file. +</para> + +<para> +Either a NULL section name or end of file indicate the presence of no more +sections. +</para> + +<para> +Each entry in the exported and local symbols table consists of the symbol +(NUL terminated) followed by two bytes which contain the value in big endian +order. The end of a symbol table is indicated by a NULL symbol name. +</para> + +<para> +Each entry in the incomplete references table consists of an expression +followed by a 16 bit offset where the reference goes. Expressions are +defined as a series of terms up to an "end of expression" term. Each term +consists of a single byte which identifies the type of term (see below) +followed by any data required by the term. Then end of the list is flagged +by a NULL expression (only an end of expression term). +</para> + +<table frame="all"><title>Object File Term Types</title> +<tgroup cols="2"> +<thead> +<row> +<entry>TERMTYPE</entry> +<entry>Meaning</entry> +</row> +</thead> +<tbody> +<row> +<entry>00</entry> +<entry>end of expression</entry> +</row> + +<row> +<entry>01</entry> +<entry>integer (16 bit in big endian order follows)</entry> +</row> +<row> +<entry>02</entry> +<entry> external symbol reference (NUL terminated symbol name follows)</entry> +</row> + +<row> +<entry>03</entry> +<entry>local symbol reference (NUL terminated symbol name follows)</entry> +</row> + +<row> +<entry>04</entry> +<entry>operator (1 byte operator number)</entry> +</row> +<row> +<entry>05</entry> +<entry>section base address reference</entry> +</row> + +<row> +<entry>FF</entry> +<entry>This term will set flags for the expression. Each one of these terms will set a single flag. All of them should be specified first in an expression. If they are not, the behaviour is undefined. The byte following is the flag. Flag 01 indicates an 8 bit relocation. Flag 02 indicates a zero-width relocation (see the EXTDEP pseudo op in LWASM).</entry> +</row> +</tbody> +</tgroup> +</table> + + +<para> +External references are resolved using other object files while local +references are resolved using the local symbol table(s) from this file. This +allows local symbols that are not exported to have the same names as +exported symbols or external references. +</para> + +<table frame="all"><title>Object File Operator Numbers</title> +<tgroup cols="2"> +<thead> +<row> +<entry>Number</entry> +<entry>Operator</entry> +</row> +</thead> +<tbody> +<row> +<entry>01</entry> +<entry>addition (+)</entry> +</row> +<row> +<entry>02</entry> +<entry>subtraction (-)</entry> +</row> +<row> +<entry>03</entry> +<entry>multiplication (*)</entry> +</row> +<row> +<entry>04</entry> +<entry>division (/)</entry> +</row> +<row> +<entry>05</entry> +<entry>modulus (%)</entry> +</row> +<row> +<entry>06</entry> +<entry>integer division (\) (same as division)</entry> +</row> + +<row> +<entry>07</entry> +<entry>bitwise and</entry> +</row> + +<row> +<entry>08</entry> +<entry>bitwise or</entry> +</row> + +<row> +<entry>09</entry> +<entry>bitwise xor</entry> +</row> + +<row> +<entry>0A</entry> +<entry>boolean and</entry> +</row> + +<row> +<entry>0B</entry> +<entry>boolean or</entry> +</row> + +<row> +<entry>0C</entry> +<entry>unary negation, 2's complement (-)</entry> +</row> + +<row> +<entry>0D</entry> +<entry>unary 1's complement (^)</entry> +</row> +</tbody> +</tgroup> +</table> + +<para> +An expression is represented in a postfix manner with both operands for +binary operators preceding the operator and the single operand for unary +operators preceding the operator. +</para> + +</chapter> +</book> +