view old-trunk/doc/internals.txt @ 348:11a95c6414b4

Added third func to instab to split resolve and emit logic
author lost@starbug
date Sat, 27 Mar 2010 22:15:07 -0600
parents eb230fa7d28e
children
line wrap: on
line source

LWASM Internals
===============

LWASM is a table-driven assembler that notionally uses two passes. However,
it implements its assembly in several passes as follows.

Pass 1 - Preprocessing & Parsing
--------------------------------

This pass reads the source file and all included source files. It handles
macro definition and expansion.

As it reads the various lines, it also identifies any symbol associated with
the line, the operation code, and, based on the operation code, the operand,
if any. Upon examination of the operand, any expressions are stored in an
internal postfix notation for later evaluation. During this pass,
preliminary values are assigned to all symbols using the largest possible
instruction size. A table of lines that reference every symbol is generated
to be used in the following pass. Note that any symbols for which the value
is known with no uncertainty factor will be generated with the smallest
possible instruction.

At this stage, simple optimizations are performed on expressions. This
includes coalescing constants (1+2+x => 3+x). It also includes some basic
algebra (x+x => 2*x, 2*x+4*x => 6*x, x-x => 0).

Pass 2 - Optimization
---------------------

This pass sweeps the code looking for operations which could use a shorter
instruction. If it finds one, it must then re-define all symbols defined
subsequently and all symbols defined in terms of one of those symbols in a
cascade. This process is repeated until no more possible reductions are
discovered.

If, in the process of implementing an instruction reduction, a phasing error
or other conflict is encountered, the reduction is backed out and marked as
forced. 

The following may be candidates for reduction, depending on assembler
options:

- extended addressing -> direct addressing (not in obj target)
- 16 bit offset -> 8 bit offset (indirect indexed)
- 16 bit offset -> 8 bit or 5 bit offset (direct indexed)
- 16 bit offset -> no offset (indexed)
- 16 bit relative -> 8 bit relative (depending on configuration)