Project 2: MIPS Assembler
Due: Sunday, October 14, 2012 at 11:59pm
An Assembler is a program that takes in a file of human-readable mnemonic representations of machine instructions and encodes them into the binary numbers that represent them to the CPU.
In this project, you will write an assembler for a subset of MIPS, shown below. The input file format and output file format are also shown below.
Opcode |
Instruction Type |
Opcode (hex) |
Funct (hex) |
addi $rt, $rs, Imm16 |
I-type |
0x8 |
|
add $rd, $rs, $rt |
R-type |
0x0 |
0x20 |
sub $rd, $rs, $rt |
R-type |
0x0 |
0x22 |
slt $rd, $rs, $rt |
R-type |
0x0 |
0x2a |
beq $rs, $rt, Imm16 |
I-type |
0x4 |
|
bne $rs, $rt, Imm16 |
I-type |
0x5 |
|
syscall |
R-type |
0x0 |
0xc |
lw $rt, Imm16($rs) |
I-type |
0x23 |
|
sw $rt, Imm16($rs) |
I-type |
0x2b |
|
j Imm26 |
J-type |
0x2 |
|
sll $rd, $rt, shamt |
R-type |
0x0 |
0x0 |
lui $rt, Imm16 |
I-type |
0xf |
|
and $rd, $rs, $rt |
R-type |
0x0 |
0x24 |
ori $rt, $rs, Imm16 |
I-type |
0xd |
|
nor $rd, $rs, $rt |
R-type |
0x0 |
0x27 |
Instruction Encodings:
Field Size |
6 bits |
5 bits |
5 bits |
5 bits |
5 bits |
6 bits |
|
Bit 31 |
|
|
|
|
Bit 0 |
R-type |
opcode |
rs |
rt |
rd |
shamt |
funct |
I-type |
opcode |
rs |
rt |
Immediate/address |
||
J-type |
opcode |
Target address |
The input file format will have two sections, a .data section that allows for string declarations only and a .text section which contains the assembly code.
The strings and opcodes can be labeled. The labels are valid C identifers (must start with letter or underscore, after that can also include numbers). They are suffixed by a colon.
Strings are ASCII characters enclosed in double quotes, terminated by an implicit NUL, and we support the \n newline escape sequence.
Opcodes and registers are to be specified in lowercase. We will only use the $r0-$r31 syntax, not the mnemonic register aliases (i.e., only $r0 not $zero).
Immediates can be in decimal or hexadecimal (with leading 0x).
Comments are line comments beginning with #.
.data
label: .asciiz “some string\n”
.text
label2: OPCODE $R0, $R1, 0x20
J label2
# this is a comment
Your assembler will produce an output file that contains the text representation of the hexadecimal value of the assembled instructions, one per line. The code contained in the text segment will live in memory starting at address 0x00400000.
The strings from the .data section will appear after the line DATA SEGMENT and have the address you are storing the string at followed by the first 4 bytes of the string. Repeat at subsequent addresses until you can encode the NUL terminal. Make sure that strings always start at a word-aligned address. The first address must be 0x10010000.
0x27bdffd8
0xafbf0024
…
DATA SEGMENT
0x10010000 0x00000023
0x10010004 0x00000000
First, you will need to create a tokenizer in JFlex that returns symbols for the parser to consume. You may use the Symbol class from JavaCUP, read the JFlex manual about %cup and the CUP examples. Discard the comments and whitespace.
You have some decisions about how to structure the token types that you return from the lexer. You could return each opcode separately or group them. I’d suggest having at least a way to distinguish between R-type, I-type, and J-type opcodes so that the parser has an easier job of knowing when you’ve put an immediate where you shouldn’t have, etc.
You have similar decisions about dealing with registers and integers. Try to plan this step with your grammar in mind.
For your grammar, make sure you’re only accepting legal operands. The parser should produce a list of Instructions. How you create the object hierarchy for that is up to you, but at least there should be an interface or base class Instruction. There is no tree necessary, so our intermediate representation is more of an “abstract syntax list”.
This project requires two passes: one to parse and find the labels and one to resolve the labels. We find the labels in the parsing phase and resolve them after parsing is done by walking the Instructions. Use a hashtable to store the instruction number that had an associated label definition.
To do code generation, iterate over your list, resolve the labels, and turn them into the hexadecimal strings to output.
To do data output, save the strings in a string table, and give them appropriate addresses. Resolve the labels during code generation and then output the DATA SEGMENT after the code is done.
· Use JFlex to implement a MIPS Assembly file lexer that returns symbols to the parser
· Use JavaCUP to implement a parser and construct a list of Instructions
· Support labels via a hashtable
· Support strings
· Consume files written in the input format above
· Output an “executable” in the format specified above
· Support at least the instructions listed in the table above
· Support all 32 general purpose registers by $r[0-31]
By the deadline, you need to submit:
1. Your JFlex file containing your lexer
2. Your JavaCUP file containing your parser
3. Your java files containing main() and any auxiliary Java files you have used
4. A Makefile to build it all
5. A README text file describing how to run it
6. Any examples that you have tested your program on
Create a zip file of the above files and upload to the submission website linked from the class webpage.