ALU stands for *Arithmetic Logic Unit*. It is a device that takes one or two numerical values, performs an arithmetic operation, and then outputs the result. Operations can range from addition to bitwise functions such as **OR, XOR,** or **AND**. It’s the heart within the processor architecture. Modern processor architectures may include multiple ALUs, but the 6502 keeps it simple with just one.

The implementation of the ALU in the 6502 is simple and only contains what’s absolutely necessary, which is great for our efforts in modeling its behavior. It takes in two input values stored in registers, **A & B**, and performs one of the five operations (**addition, bitwise OR, XOR, AND, and right shift**) and then stores the result in the output register.
The processor takes care of deciding which two registers are being fed to the ALU. Then, all five operations are executed in parallel and the processor decides which buffer will get written to the output register by setting the control bits. In essence, the control bits which decide which operation is being executed on the inputs.

There’s also an **overflow** status bit which gets outputted when adding two values. Since the 6502 only supports 8-bit signed values, the result of the addition must range from **-127 to 128**. If the result is **more than 128**, we have what’s called an **overflow**. If the result comes out to be **less than -127**, then that is an **underflow**. Whenever either one happens, the 6502 will set the **overflow** bit and the processor will know that the result was out of bounds.

In the original implementation, the values from the five operations are buffered, but in my implementation they are not and are simply selected using a multiplexer.

Implementing this ALU as a Verilog module is very simple. First off we have our inputs and outputs:

```
module alu_6502(
input wire [7:0] regA,
input wire [7:0] regB,
input wire [4:0] control,
output reg [7:0] regOut,
output reg overflow
);
```

The registers should be self-explanatory. We also have a **5-bit control** input which controls the operation being outputted through a multiplexer. Lastly, there’s our **overflow** status bit which will be elaborated upon soon.

The bitwise functions are implemented through simple combinatorial functions as shown below:

```
// Combinatorial functions
assign orOut = regA | regB;
assign xorOut = regA ^ regB;
assign andOut = regA & regB;
assign shiftOut = regA >> regB;
```

Note that in this implementation, we can shift right multiple times as defined by `regB`

. This may or may not result in an additional opcode in the future.

Really the only challenge in the Verilog ALU is figuring out how to calculate the **overflow** bit. There is a combinatorial formula I learned back in Digital Systems which is `V = carryBit[6] XOR carryBit[7]`

. However, since we are not implementing any carry bits to calculate the addition, this would complicate the code. This method led to the development of my current algorithm which simply adds an additional bit to both inputs and the outputs and is able to determine using that whether an overflow or underflow occurred.

```
// Detect overflow/underflow and perform addition
always @(*) begin
// Perform addition with an extra bit (which is the same as the value of the
// MSB for the input register)
// Store the result in sumOut and store the additional bit in extraBit
{extraBit, sumOut} = {regA[7], regA} + {regB[7], regB};
// If the extra bit and the MSB of the sum is 0x01, overflow
// If the extra bit and the MSB of the sum is 0x10, underflow
// Otherwise the overflow bit is set low
overflow = ({extraBit, sumOut[7]} == 2'b01) || ({extraBit, sumOut[7]} == 2'b10);
end
```

The curly braces are Verilog’s concatenation operator. They combine multiple bits from different variables together into one. Remember that we are programming this using **HDL** and so these are actually wires. To a wire it makes no difference where it is being fed from or to; as long as there are two endpoints.

Finally, the results are multiplexed using a case structure.

```
// Multiplex the result according to the control
always @(*) begin
case(control)
`SUMS: regOut = sumOut;
`ORS: regOut = orOut;
`XORS: regOut = xorOut;
`ANDS: regOut = andOut;
`SRS: regOut = shiftOut;
endcase
end
```

Once I was happy with the ALU module, it was time to develop a test bench. The test bench feeds input values to the ALU module so we can see the outputs on the simulator. The file name is `alu_6502_tb.v`

. Let’s begin with the tests!

The most important of all the tests, here we’ll be testing addition, subtraction, addition with overflow, and subtraction with underflow. Since numbers are represented in **two’s complement**, subtraction is simply addition with a negative number. The **two’s complement** of a positive will result in the negative number and vice-versa. Below are the test benches for addition:

```
// Regular addition: 100 + 11 = 111
a <= 8'b01100100; // 100
b <= 8'b00001011; // 11
ctrl <= `SUMS;
#10
// Regular subtraction: 99 - 88 = 11
a <= 8'b01100011; // 99
b <= 8'b10101000; // -88
ctrl <= `SUMS;
#10
// Overflow: 70 + 60 = -126
a <= 8'b01000110; // 70
b <= 8'b00111100; // 60
ctrl <= `SUMS;
#10
// Underflow: -60 - 70 = 126
a <= 8'b11000100; // -60
b <= 8'b10111010; // -70
ctrl <= `SUMS;
#10
```

Note that the results of the overflows that are written in the comments are the results we would expect to get – along with the overflow flag getting set – not the correct mathematical result of the equation.

We run this through our simulator and we find our answers below:

The image you are looking at is a screenshot of the output of the simulator. It is divided into five different signals which correlate to the inputs and outputs of the ALU module. The green signals are the inputs and the cyan signals are the outputs. Since there are several wires being monitored, they are combined into **buses** and represented as **hexadecimal numbers**.

Each of the four groups of signals correlate with the four tests done on the test bench. All the inputs and outputs match and so the test is passed. Note that the overflow signal `v`

is HIGH on the last two tests where an overflow/underflow is triggered. Looks good!

The remaining tests are very straightforward. This one is on the bitwise OR function. Below is our testbench:

```
// LOGICAL OR
// 0xAA | 0x55 = 0xFF
a <= 8'b10101010; // 0xAA
b <= 8'b01010101; // 0x55
ctrl <= `ORS;
#10
// 0x75 | 0xC2 = 0xF7
a <= 8'b01110101; // 0x75
b <= 8'b11000010; // 0xC2
ctrl <= `ORS;
#10
```

The OR function is very simple, if there is a ONE bit located in a certain location on either register the result will be ONE at that location. Otherwise, it’s zero. Here are the results from the simulation:

One can visually check and realize that the OR function is working as expected in both tests.

Ah the XOR operator! This one acts just like an OR, except if both inputs are set, then the output goes back to zero at that bit. We have two tests for this function:

```
// LOGICAL XOR
// 0xAA ^ 0x55 = 0xFF
a <= 8'b10101010; // 0xAA
b <= 8'b01010101; // 0x55
ctrl <= `XORS;
#10
// 0x75 ^ 0xC2 = 0xB7
a <= 8'b01110101; // 0x75
b <= 8'b11000010; // 0xC2
ctrl <= `XORS;
#10
```

As expected, the tests match the output perfectly!

The last of our bitwise function is the AND. If both inputs are HIGH at a certain bit, the output is also HIGH, otherwise it’s zero. Below are the tests:

```
// LOGICAL AND
// 0xAA & 0x55 = 0x00
a <= 8'b10101010; // 0xAA
b <= 8'b01010101; // 0x55
ctrl <= `ANDS;
#10
// 0x75 & 0xE4 = 0x64
a <= 8'b01110101; // 0x75
b <= 8'b11100100; // 0xE4
ctrl <= `ANDS;
#10
```

And they match quite well!

This final operation simply shifts the value in `regA`

a number of times defined by the value in `regB`

. This behavior is different than the actual opcode on the 6502. The original instruction can only shift `regA`

once. I decided to extend that so we may create an extra opcode later. Anyway, the test bench is below:

```
// SHIFT RIGHT
// 0x3A >> 0x01 = 0x1D
a <= 8'b00111010; // 0x3A
b <= 8'b00000001; // 0x01
ctrl <= `SRS;
#10
// 0x3A >> 0x04 = 0x07
a <= 8'b01110101; // 0x3A
b <= 8'b00000100; // 0x04
ctrl <= `SRS;
#10
```

Shifting by zero would be pointless as the result would be the same as the input and shifting by eight or more would also be pointless as the result would always just be zero. Our two tests show some basic shifting and the results from the simulation validate those tests.

The ALU was pretty straightforward to implement but this will serve as the core for the rest of the project. It was great starting out with this as it served as a warmup exercise in Verilog as well as creating test benches in the simulator. Really the time spent documenting this was much more than the time it took to make it. If you want the source code and project files, check out the link below. Feel free to ask any questions or leave a comment if you found this helpful. In the meantime, stay tuned for the next part as we get our hands dirty with the processor architecture!

Check it out on GitHub (rangeli/alu_6502)

Load comments ...subscribe via RSS