Operating System Design :: Lessons :: Machine Language
Abstractions
This lesson is a synopsis of Chapter 4 from the book Building a Modern Computer From First Principles. Abstraction is a technique to establish a level of complexity so a human interacting with a system does not need to know all of the details of that system. You can use Visual Studio as an example of abstraction. You usually did not need to know how button clicks or other events worked. You simply wrote your code in the button click event handler and it worked. A machine language is a low-level programming language that serves in the bridge between hardware and software. Machine language is the lowest level of computing language there is before human instructions via code turn into the 1's and 0's of binary. Our machine language will be used to manipulate three abstract objects: a memory, a processor, and registers.
A memory is the collection of hardware devices stores data and instructions in a computer. A memory is a continuous array of cells of a fixed width each of which has its own address. A processor, or central processing unit (CPU), is a device capable of performing a fixed number of elementary operations. Many processors are equipped with several registers that are capable of holding a single values. These registers are located close to the processor and serve as high-speed memory.
Machine Language
A machine language program is a series of coded instructions similar to a program written in C# or Java, but at a much lower-level. This lower level means it is harder for humans to read, but easier for the computer to interpret. Although every computer architecture has its own set of potential commands there are a few basic commands present in nearly every language.
Basic arithmetic and logic operations are essential to a machine language. Below are a few examples:
ADD R2,R1,R3 // R2<---R1+R3 where R1,R2,R3 are registers ADD R2,R1,foo // R2<---R1+foo where foo stands for the value of the memory location pointed at by the user-defined label foo. AND R1,R1,R2 // R1<---bit wise And of R1 and R2
Memory access falls into two categories. In the above examples we were operating on registers, but you can do the same thing with memory. There are also LOAD and STORE commands that can be used in machine language to move data between registers and memory. There are usually different addressing modes that provide ways of specifying the address of the required memory word.
Directing addressing is the most common addressing mode and uses a specific address or a symbol that refers to that address.
LOAD R1,67 // R1<---Memory[67] // Or, assuming that bar refers to memory address 67: LOAD R1,bar // R1<---Memory[67]
Immediate addressing is used to load constants, or values that already appear in the instruction code.
LOADI R1,67 // R1<---67
Indirect addressing specifies a location that holds the required address. This is how pointers are handled in machine language.
// Translation of x=foo[j] or x=*(foo+j): ADD R1,foo,j // R1<---foo+j LOAD* R2,R1 // R2<---Memory[R1] STR R2,x // x<---R2
Machine languages also include a way to control the flow of execution using repetition or conditional statements. Below is an example of a while loop in C translated into generic machine code.
// A while loop: while (R1>=0) { code segment 1 } code segment 2
beginWhile: JNG R1,endWhile // If R1<0 goto endWhile // Translation of code segment 1 comes here JMP beginWhile // Goto beginWhile endWhile: // Translation of code segment 2 comes here
The Hack Machine Language
The Hack computer used for this class is a Von Neumann architecture consisting of an input, an output, a memory, and a central processing unit made up of a control unit and an ALU. The Hack computer is a 16-bit machine that consists of a CPU, two memory modules that serve as instruction memory and data memory, and two memory-mapped I/O devices: a screen and a keyboard.

A 16-bit instruction memory and a 16-bit data memory are both available with a 15-bit address space. The CPU can only execute program in instruction memory, which is read-only. The data memory is used for storing everything else on the computer.
The Hack computer has two registers called D and A. D is used solely for data values, while A can be used for data or addresses. The @value command is used to stores the specified value in the A register. So @17 and @sum would store the value of 17 in the A register, assuming the label sum refers to memory location 17.
Every operation in the Hack language involving memory requires two commands. The first command, or address instruction selects the address on which you will operate. The second command, or compute instruction does the actual computation. The A-instruction has three different purposes. It is the only way to enter a constant, it sets the stage for a subsequent C-instruction designed to manipulate memory, and it sets the stage for a subsequent C-instruction that specifies a jump. Below is an example of a C program that adds the numbers 1 through 100 followed by the same program written in Hack machine language.
// Adds 1+...+100. int i = 1; int sum = 0; While (i <= 100){ sum += i; i++; }
// Adds 1+...+100. @i // i refers to some mem. location. M=1 // i=1 @sum // sum refers to some mem. location. M=0 // sum=0 (LOOP) @i D=M // D=i @100 D=D-A // D=i-100 @END D;JGT // If (i-100)>0 goto END @i D=M // D=i @sum M=D+M // sum=sum+i @i M=M+1 // i=i+1 @LOOP 0;JMP // Goto LOOP (END) @END 0;JMP // Infinite loop

The C-instruction does just about everything in the Hack computer. The instruction code answers what to compute, where to store the computed value, and what to do next. The first bit of the instruction is always 1 while the next two bits are not used. The comp field instructs the ALU what to compute, the dest field instructs where to store the computed value from the ALU, and the jump field specifies where to go next.
Symbols
Symbols are used to refer to memory addresses. There are three types of symbols in most machine languages:
- Predefined Symbols: A special subset of RAM addresses can be referred to by any assembly program using the following predefined symbols:
- Virtual Registers: To simplify assembly programming, the symbols R0 to R15 are predefined to refer to RAM addresses 0 to 15, respectively.
- Predefined Pointers: The symbols SP, LCL, ARG, THIS, and THAT are predefined to refer to RAM addresses 0 to 4, respectively.
- I/O Pointers: The symbols SCREEN and KBD are predefined to refer to RAM addresses 16384 (0x4000) and 24576 (0x6000), respectively, which are the base addresses of the screen and keyboard memory maps.
- Label Symbols: These user-defined symbols, which serve to label destinations of goto commands, are declared by the pseudo-command "(Xxx)".
- Variable Symbols: Any user-defined symbol Xxx appearing in an assembly program that is not predefined and is not defined elsewhere using label symbols is treated as a variable, and is assigned a unique memory address by the assembler, starting at RAM address 16 (0x0010).
Input and Output
The Hack computer can be connected to a screen and a keyboard. The screen is a black-and-white screen with 256 rows and 512 pixels per row. The contents of the screen are stored in memory starting at RAM address 16384 (0x4000). The example below would blacken the left-most pixel on the screen.
// Draw a single black dot at the screen's top left corner: @SCREEN // Set the A register to point to the memory word that is mapped to the 16 left-most pixels of the top row of the screen. M=1 // Blacken the left-most pixel.
Whenever a key is pressed on the keyboard its ASCII code appears in RAM address 24576 (0x6000). In addition to the standard ASCII codes the following codes are used on the Hack platform:
Key Pressed | Code |
---|---|
newline | 128 |
backspace | 129 |
left arrow | 130 |
up arrow | 131 |
right arrow | 132 |
down arrow | 133 |
home | 134 |
end | 135 |
page up | 136 |
page down | 137 |
insert | 138 |
delete | 139 |
esc | 140 |
f1-f12 | 141-152 |