Project 2: MIPS Assembler
Due: Sunday, October 14, 2012 at 11:59pm

Description

An Assembler is a program that takes in a file of human-readable mnemonic representations of machine instructions and encodes them into the binary numbers that represent them to the CPU.

In this project, you will write an assembler for a subset of MIPS, shown below. The input file format and output file format are also shown below.

MIPS Subset

Opcode	Instruction Type	Opcode (hex)	Funct (hex)
addi $rt, $rs, Imm16	I-type	0x8
add $rd, $rs, $rt	R-type	0x0	0x20
sub $rd, $rs, $rt	R-type	0x0	0x22
slt $rd, $rs, $rt	R-type	0x0	0x2a
beq $rs, $rt, Imm16	I-type	0x4
bne $rs, $rt, Imm16	I-type	0x5
syscall	R-type	0x0	0xc
lw $rt, Imm16($rs)	I-type	0x23
sw $rt, Imm16($rs)	I-type	0x2b
j Imm26	J-type	0x2
sll $rd, $rt, shamt	R-type	0x0	0x0
lui $rt, Imm16	I-type	0xf
and $rd, $rs, $rt	R-type	0x0	0x24
ori $rt, $rs, Imm16	I-type	0xd
nor $rd, $rs, $rt	R-type	0x0	0x27

Instruction Encodings:

Field Size	6 bits	5 bits	5 bits	5 bits	5 bits	6 bits
	Bit 31					Bit 0
R-type	opcode	rs	rt	rd	shamt	funct
I-type	opcode	rs	rt	Immediate/address
J-type	opcode	Target address

Input Format

The input file format will have two sections, a .data section that allows for string declarations only and a .text section which contains the assembly code.

The strings and opcodes can be labeled. The labels are valid C identifers (must start with letter or underscore, after that can also include numbers). They are suffixed by a colon.

Strings are ASCII characters enclosed in double quotes, terminated by an implicit NUL, and we support the \n newline escape sequence.

Opcodes and registers are to be specified in lowercase. We will only use the $r0-$r31 syntax, not the mnemonic register aliases (i.e., only $r0 not $zero).

Immediates can be in decimal or hexadecimal (with leading 0x).

Comments are line comments beginning with #.

.data

label: .asciiz “some string\n”

.text

label2: OPCODE $R0, $R1, 0x20

J label2

# this is a comment

Output Format

Your assembler will produce an output file that contains the text representation of the hexadecimal value of the assembled instructions, one per line. The code contained in the text segment will live in memory starting at address 0x00400000.

The strings from the .data section will appear after the line DATA SEGMENT and have the address you are storing the string at followed by the first 4 bytes of the string. Repeat at subsequent addresses until you can encode the NUL terminal. Make sure that strings always start at a word-aligned address. The first address must be 0x10010000.

0x27bdffd8
0xafbf0024

…

DATA SEGMENT
0x10010000 0x00000023

0x10010004 0x00000000

Approach

First, you will need to create a tokenizer in JFlex that returns symbols for the parser to consume. You may use the Symbol class from JavaCUP, read the JFlex manual about %cup and the CUP examples. Discard the comments and whitespace.

You have some decisions about how to structure the token types that you return from the lexer. You could return each opcode separately or group them. I’d suggest having at least a way to distinguish between R-type, I-type, and J-type opcodes so that the parser has an easier job of knowing when you’ve put an immediate where you shouldn’t have, etc.

You have similar decisions about dealing with registers and integers. Try to plan this step with your grammar in mind.

For your grammar, make sure you’re only accepting legal operands. The parser should produce a list of Instructions. How you create the object hierarchy for that is up to you, but at least there should be an interface or base class Instruction. There is no tree necessary, so our intermediate representation is more of an “abstract syntax list”.

This project requires two passes: one to parse and find the labels and one to resolve the labels. We find the labels in the parsing phase and resolve them after parsing is done by walking the Instructions. Use a hashtable to store the instruction number that had an associated label definition.

To do code generation, iterate over your list, resolve the labels, and turn them into the hexadecimal strings to output.

To do data output, save the strings in a string table, and give them appropriate addresses. Resolve the labels during code generation and then output the DATA SEGMENT after the code is done.

Requirements

· Use JFlex to implement a MIPS Assembly file lexer that returns symbols to the parser

· Use JavaCUP to implement a parser and construct a list of Instructions

· Support labels via a hashtable

· Support strings

· Consume files written in the input format above

· Output an “executable” in the format specified above

· Support at least the instructions listed in the table above

· Support all 32 general purpose registers by $r[0-31]

Submission

By the deadline, you need to submit:

1. Your JFlex file containing your lexer

2. Your JavaCUP file containing your parser

3. Your java files containing main() and any auxiliary Java files you have used

4. A Makefile to build it all

5. A README text file describing how to run it

6. Any examples that you have tested your program on

Create a zip file of the above files and upload to the submission website linked from the class webpage.