diff doc/internals.txt @ 333:ebff3a3e8fa6

Updated internals to describe the multi-pass architecture
author lost
date Tue, 02 Mar 2010 00:44:18 +0000
parents ed3553296580
children e7885b3ee266
line wrap: on
line diff
--- a/doc/internals.txt	Tue Mar 02 00:10:32 2010 +0000
+++ b/doc/internals.txt	Tue Mar 02 00:44:18 2010 +0000
@@ -4,46 +4,55 @@
 LWASM is a table-driven assembler that notionally uses two passes. However,
 it implements its assembly in several passes as follows.
 
-Pass 1 - Preprocessing & Parsing
---------------------------------
+Pass 1
+------
 
-This pass reads the source file and all included source files. It handles
-macro definition and expansion.
+This pass reads the entire source code and parses each line into an internal
+representation. Macros, file inclusions, and conditional assembly
+instructions are resolved at this point as well.
+
+Pass 2
+------
 
-As it reads the various lines, it also identifies any symbol associated with
-the line, the operation code, and, based on the operation code, the operand,
-if any. Upon examination of the operand, any expressions are stored in an
-internal postfix notation for later evaluation. During this pass,
-preliminary values are assigned to all symbols using the largest possible
-instruction size. A table of lines that reference every symbol is generated
-to be used in the following pass. Note that any symbols for which the value
-is known with no uncertainty factor will be generated with the smallest
-possible instruction.
+This pass assigns instruction sizes to all invariate instructions. Invariate
+instructions are any instructions with a fixed size, including those with
+forced addressing modes.
 
-At this stage, simple optimizations are performed on expressions. This
-includes coalescing constants (1+2+x => 3+x). It also includes some basic
-algebra (x+x => 2*x, 2*x+4*x => 6*x, x-x => 0).
+Pass 3
+------
+
+This pass resolves all instruction sizes that can be resolved without
+setting addresses for instructions. This process is repeated until no
+further instructions sizes are resolved.
 
-Pass 2 - Optimization
----------------------
+Pass 4
+------
 
-This pass sweeps the code looking for operations which could use a shorter
-instruction. If it finds one, it must then re-define all symbols defined
-subsequently and all symbols defined in terms of one of those symbols in a
-cascade. This process is repeated until no more possible reductions are
-discovered.
+This pass assigns addresses to all symbols where values are known. It does
+the same for instructions. Then a repeat of similar algorithms as in the
+previous pass is used to resolve as many operands as possible.
+
+This pass is repeated multiple times until no further instructions or
+symbols are resolved.
 
-If, in the process of implementing an instruction reduction, a phasing error
-or other conflict is encountered, the reduction is backed out and marked as
-forced. 
+Pass 5
+------
 
-The following may be candidates for reduction, depending on assembler
-options:
+Finalization of all instruction sizes by forcing them to the maximum
+addressing mode. Then all remaining instruction addresses and symbol values
+are resolved.
 
-- extended addressing -> direct addressing (not in obj target)
-- 16 bit offset -> 8 bit offset (indirect indexed)
-- 16 bit offset -> 8 bit or 5 bit offset (direct indexed)
-- 16 bit offset -> no offset (indexed)
-- 16 bit relative -> 8 bit relative (depending on configuration)
+Pass 6
+------
+
+This pass does actual code generation.
 
 
+Expression Evaluation
+=====================
+
+Each expression carries a certainty flag. Any expression in which any term
+is flagged as uncertain is, itself, uncertain. There are a few specific
+cases where such uncertainty can cancel out. For instance, X-X where X is
+uncertain is guaranteed to be 0 and so there is no uncertainty.
+