Matej's V810 assembler Version 1 Readme file *** Introduction *** MV810ASM is an assembler that produces machine code for the NEC V810 architecture. It is easy to use and does not make any assumptions about the machine or environment the assembled code will run on. It also optionally supports Nintendo Virtual Boy-specific instructions and has a ROM-hacking mode that makes it easy to develop and test modifications to existing machine code files. *** Usage *** The assembler is a command line program. It is run with a command line of the form mv810asm inFile outFile [options] where "mv810asm" is the filename of the assembler, "inFile" is the source code file to assemble, "outFile" is the filename of the machine code file to produce, and "options", if present, are one or more of the following: E name Creates an identifier with the value of -1 (see "Identifiers" later in this document). H rom Activates ROM-hacking mode. "rom" is the file used as the template for the output file. I name value Creates an identifier with the specified value (see "Identifiers" later in this document). The value must be a number or an already defined identifier. L Lists identifiers to standard output after the assembly is completed. The order in which identifiers are listed is undefined. V Allows usage of Nintendo Virtual Boy-specific instructions. The letter of each option must be preceded by the platform-specific option character. On DOS, this is user-configurable and a slash is used if the DOS used does not support this feature. On Win32, this is a slash. On OS X and Haiku, this is a dash. All options are case insensitive. If the assembly completes successfully, the assembler will silently quit with an error code of 0. Otherwise, error messages will be output to the standard error stream and the assembler will quit with a nonzero error code. In this case, the contents of the output file are invalid and should not be used. *** Source code syntax *** The assembler reads source code line by line. Ignoring comments which begin with a semicolon and last until the end of the line, a non-empty line may specify a directive, instruction, or pseudoinstruction. All processing is case insensitive. Whitespace is ignored except where necessary to divide directives, instructions, and pseudoinstructions from their operands. Source code files do not need to end with an empty line. If the line ends with a colon, it is a label (see "Identifiers" later in this document). Otherwise, a word (terminated by whitespace) is read from the line and, depending on it, zero or more comma-separated operands may be read. Register operands may be referred to as "$" or "r" followed by the number of the register. Registers 3, 4, and 31 may be referred to as "SP", "GP", and "LP" respectively, optionally preceded with a dollar sign. System registers (used in the LDSR and STSR instructions) are referred to exclusively by their names and may be preceded with a dollar sign. Numbers may be decimal or hexadecimal, the latter in "C" (prefixed by "0x") or "Intel" (suffixed by "h") format. Hexadecimal numbers may be negative. For example, "-0x10" is the same as "-16". Strings begin and end with quotes. They are left as they are. No escaping mechanism is currently supported. Instructions use the usual V810 mnemonics. The SETF instructions have the condition specified as part of the mnemonic, not as an operand. The operand of the JMP instruction may be specified as "register" or "[register]". The source code must be encoded in an 8-bit encoding. If you need Unicode, use UTF-8 without a byte order mark. *** Identifiers *** An identifier is a name with an associated 32-bit value. Identifiers may be created by labels, directives, or command line options, but they are all treated the same way during assembly. Identifiers may be global or local. Global identifiers may be created at any time and referred to anywhere. Local identifiers begin with a dot and are relative to the global label they were created after. They may only be referred to within the scope of the same global label or by explicitly specifying the global label before their name. Local identifiers may be created even outside their usual scope if their global label is prefixed to their name. The value of a label is the value of the program counter when the label will be reached. It is influenced by the !ORG directive (see "Directives" later in this document). It is not related to the position in the output file. Label values are automatically aligned if necessary, and padding bytes are inserted into the output file. The values of the padding bytes are undefined. Here is an example of global and local labels: !CONST A, 100 ; Global identifier "A", value 100 Fn1: ; Global identifier "Fn1", value of current PC ?mov A, $6 ; $6 is set to 100 jal Fn2 ?mov .data, $6 ; $6 is set to ".data"/"Fn1.data", value of the PC there ?br Fn3 ; (Padding bytes may be inserted here to align the following label) .data: ; Local identifier ".data"/"Fn1.data", value of current PC ?dw 0x12345678 ; ... ?db 0xFE ; PC may be unaligned after this line ; (A padding byte may be inserted here to align the following label) Fn2: ; Global identifier "Fn2", value of current PC !CONST .x, 123 ; Local identifier ".x"/"Fn2.x", value 123 ?mov .x, $10 ; $10 is set to 123 ?mov A, $11 ; $11 is set to 100 ; ... jmp $LP Fn3: ; Global identifier "Fn3", value of current PC !CONST .x, 456 ; Local identifier ".x"/"Fn3.x", value 456 ?mov .x, $10 ; $10 is set to 456 ?mov Fn2.x, $11 ; $11 is set to 123 ; ... *** Expressions *** Wherever the assembler expects a constant value, an expression may be used. As in most other programming languages, expressions are written using infix notation. The following operators are supported, in order of priority (most to least): 1. - Negation 2. << Left shift 3. * Multiplication / Division 4. + Addition - Subtraction 5. = Equal <> Not equal < Less than <= Less than or equal > Greater than >= Greater than or equal 6. & Bitwise "and" | Bitwise "or" ^ Bitwise exclusive "or" Negation may only be used for literals and identifiers. For example, "-A" and "-1" are valid, but "-(A + B)" is not. Use "0 - (A + B)" in that case. The comparison operators return 0 for false and -1 for true. This lets bitwise operators be used for logical operations. All calculations are performed on 32-bit signed integers. *** Directives *** Directives influence the code assembled after them. With the potential exception of !INCLUDE, they do not insert any bytes into the output file themselves. !CONST name, value Creates an identifier with the specified value. !ENDIF Ends the matching !IF directive. !IF condition Conditionally assembles lines until the matching !ENDIF. If the condition evaluates to 0, lines will be ignored until the matching !ENDIF is encountered. Any undefined identifiers will be treated as 0. !IF blocks may be nested. !INCLUDE filename Includes another source file. The file is processed as if its contents were present in the file it is being included from. The filename is a string in quotes. !INCLUDE directives may be nested. !ORG address Sets the assumed value of the program counter. This is unrelated to the position of the code in the output file. It is used to calculate offsets for branch and jump instructions. The initial program counter value is undefined, so this directive should be used before any labels or instructions. !RBASE address Sets the base address for all following !RB, !RH, and !RW instructions. !R{B|H|W} name [, quantity] Creates an identifier with the specified name at the current !RBASE base address, which is then incremented by the data size (byte, halfword, or word) multiplied by the quantity (assumed to be 1, if omitted). This is useful for reserving locations in RAM for global variables or defining structures. !SEEK position Sets the position of the code in the output file. The position is unrelated to the assumed value of the program counter, so it does not have to be halfword-aligned. Padding bytes will be inserted until the current position matches the one specified. The padding bytes are undefined. In ROM-hacking mode, no padding bytes will be inserted. The initial position is 0. *** Pseudoinstructions *** Pseudoinstructions translate into one or more instructions or insert data into the output file. ?ADD immediate, register Produces an immediate ADD instruction if possible, otherwise an ADDI. ?BR destination Produces a BR instruction if possible, otherwise a JR. ?CSTRING string Inserts a "C string" (a string followed by a NUL byte) into the output file. ?D{B|H|W} value [, value, ...] Inserts bytes, halfwords, or words into the output file. Padding bytes will be inserted automatically before and after the pseudoinstruction if needed. The padding bytes are undefined. Any number of values may be specified. For !DW, the values may also be yet undefined identifiers. This may be used to create lists of pointers. ?MOV immediate, register Produces an immediate MOV, MOVEA, or MOVHI instruction, or a MOVHI/MOVEA pair, to set a register to a constant. ?MOVEA immediate, register, register Produces a MOVHI or MOVEA instruction or both to set a register to another register plus a constant. ?PSTRING string Inserts a "Pascal string" (length byte followed by the string) into the output file. The string may not be longer than 255 bytes. ?STRING string Inserts a string as it is into the output file. *** ROM-hacking mode *** The assembler has a mode intended for modifying existing machine code files. In the ROM-hacking mode, the output file is a copy of the template file with modifications applied at positions specified by !SEEK directives. This allows fast development of patches as modifications are automatically applied each time to a copy of the template file and no additional manual work has to be done. The H option activates ROM-hacking mode. *** Credits *** MV810ASM was created by Matej Horvat. Web site: http://matejhorvat.si/en/software/mv810asm/ Electronic mail: matej.horvat@guest.arnes.si