Project 2: MIPS Assembler
Due: Sunday, October 14, 2012 at 11:59pm

Description

An Assembler is a program that takes in a file of human-readable mnemonic representations of machine instructions and encodes them into the binary numbers that represent them to the CPU.

In this project, you will write an assembler for a subset of MIPS, shown below. The input file format and output file format are also shown below.

MIPS Subset

Opcode

Instruction Type

Opcode (hex)

Funct (hex)

addi $rt, $rs, Imm16

I-type

0x8

 

add $rd, $rs, $rt

R-type

0x0

0x20

sub $rd, $rs, $rt

R-type

0x0

0x22

slt $rd, $rs, $rt

R-type

0x0

0x2a

beq $rs, $rt, Imm16

I-type

0x4

 

bne $rs, $rt, Imm16

I-type

0x5

 

syscall

R-type

0x0

0xc

lw $rt, Imm16($rs)

I-type

0x23

 

sw $rt, Imm16($rs)

I-type

0x2b

 

j Imm26

J-type

0x2

 

sll $rd, $rt, shamt

R-type

0x0

0x0

lui $rt, Imm16

I-type

0xf

 

and $rd, $rs, $rt

R-type

0x0

0x24

ori $rt, $rs, Imm16

I-type

0xd

 

nor $rd, $rs, $rt

R-type

0x0

0x27

 

Instruction Encodings:

Field Size

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

 

Bit 31

 

 

 

 

Bit 0

R-type

opcode

rs

rt

rd

shamt

funct

I-type

opcode

rs

rt

Immediate/address

J-type

opcode

Target address

 

Input Format

The input file format will have two sections, a .data section that allows for string declarations only and a .text section which contains the assembly code.

The strings and opcodes can be labeled. The labels are valid C identifers (must start with letter or underscore, after that can also include numbers). They are suffixed by a colon.

Strings are ASCII characters enclosed in double quotes, terminated by an implicit NUL,  and we support the \n newline escape sequence.

Opcodes and registers are to be specified in lowercase. We will only use the $r0-$r31 syntax, not the mnemonic register aliases (i.e., only $r0 not $zero).

Immediates can be in decimal or hexadecimal (with leading 0x).

Comments are line comments beginning with #.

.data

label: .asciiz “some string\n”

.text

label2: OPCODE $R0, $R1, 0x20

     J label2

# this is a comment

Output Format

Your assembler will produce an output file that contains the text representation of the hexadecimal value of the assembled instructions, one per line. The code contained in the text segment will live in memory starting at address 0x00400000.

The strings from the .data section will appear after the line DATA SEGMENT and have the address you are storing the string at followed by the first 4 bytes of the string. Repeat at subsequent addresses until you can encode the NUL terminal. Make sure that strings always start at a word-aligned address. The first address must be 0x10010000.

0x27bdffd8
0xafbf0024

DATA SEGMENT
0x10010000 0x00000023

0x10010004 0x00000000

Approach

First, you will need to create a tokenizer in JFlex that returns symbols for the parser to consume. You may use the Symbol class from JavaCUP, read the JFlex manual about %cup and the CUP examples. Discard the comments and whitespace.

You have some decisions about how to structure the token types that you return from the lexer. You could return each opcode separately or group them. I’d suggest having at least a way to distinguish between R-type, I-type, and J-type opcodes so that the parser has an easier job of knowing when you’ve put an immediate where you shouldn’t have, etc.

You have similar decisions about dealing with registers and integers. Try to plan this step with your grammar in mind.

For your grammar, make sure you’re only accepting legal operands. The parser should produce a list of Instructions. How you create the object hierarchy for that is up to you, but at least there should be an interface or base class Instruction. There is no tree necessary, so our intermediate representation is more of an “abstract syntax list”.

This project requires two passes: one to parse and find the labels and one to resolve the labels. We find the labels in the parsing phase and resolve them after parsing is done by walking the Instructions. Use a hashtable to store the instruction number that had an associated label definition.

To do code generation, iterate over your list, resolve the labels, and turn them into the hexadecimal strings to output.

To do data output, save the strings in a string table, and give them appropriate addresses. Resolve the labels during code generation and then output the DATA SEGMENT after the code is done.

Requirements

·         Use JFlex to implement a MIPS Assembly file lexer that returns symbols to the parser

·         Use JavaCUP to implement a parser and construct a list of Instructions

·         Support labels via a hashtable

·         Support strings

·         Consume files written in the input format above

·         Output an “executable” in the format specified above

·         Support at least the instructions listed in the table above

·         Support all 32 general purpose registers by $r[0-31]

Submission

By the deadline, you need to submit:

1.       Your JFlex file containing your lexer

2.       Your JavaCUP file containing your parser

3.       Your java files containing main() and any auxiliary Java files you have used

4.       A Makefile to build it all

5.       A README text file describing how to run it

6.       Any examples that you have tested your program on

Create a zip file of the above files and upload to the submission website linked from the class webpage.