Tic Tac Toe Game

Embedded SoC Project

alexander's profile picture


In this project, we brought a historic paper-and-pencil game to the digital world by implementing Tic Tac Toe on our FPGA board. With ARM's IP cores, we were able to design a SoC, System on Chip, which includes the Cortex-M0 processor, AHB-Lite System Bus and other peripherals such as BRAM, UART, VGA and Timer.

For the game design, we used the keyboard keys "WASD" and spacebar to control the cursor, a white box, that indicated which box on the grid the player would like to place there symbol on. Each player has there own unique symbol, X and O. At the end of each game, the winner or draw wil be displayed in the text region.

This game was successfully developed and implemented using Verilog, Assembly and C. See the demonstration video above.


Tic Tac Toe is a paper-and-pencil game that is a 3x3 grid for the gameboard. There are two players that have a unique symbol. The first player to get three in row either column, row, or diagnol wins the game. There is also the possibility of a draw. The players have a time limit of 10 seconds to make a move or they lose. They can chose the grid placement using the "WASD" keys and confirm with the spacebar. The winning player is congratulated at the end of the game.


  1. Implement Embedded SoC that includes ARM Cortex-M0 IP Cores, AHB-Lite Bus, UART, VGA, Timer amd BRAM.
  2. Create a testbench for the project to test for errors and functionality
  3. Develop APIs and Drivers to design Tic Tac Toe Game
  4. Program and debug game on the Nexys A7 FPGA Board

Parts List

  • Nexys A7 FPGA Board nexya7
  • VGA Monitor
  • Teraterm (To use keyboard for UART)
  • 7-Segment Display
  • Hardware Design

    There are a three main components that need to be implemented to create the SoC, also known as System on Chip. The first component is the processor. This project has been provided with the ARM Cortex-M0 IP cores from ARM. The second is the interconnect protocol called AHB-Lite Bus, short for Advanced High Performance Bus, which allows the processor to connect with the peripherals. The last component are the peripherals that includes VGA, GPIO, Timer, Memory, and etc. The block diagram of the SoC can be seen in Figure 1 which provides a visual of all the components needed to complete the SoC.

    alexander's profile picture
    Figure 1: SoC Block Diagram

    AHB-Lite Bus

    Figure 2: AHB-LIte Block Diagram

    There are two main components to implement the AHB-Lite bus: Address Decoder and Subordinate Multiplexor. As shown in Figure 2, there are three outputs from the manager, in this project it is the processor, and two inputs. The ARM Cortex-M0 is a 32-bit processor which is why the Data and Address wires are 32-bits wide. The subordinates are the peripherals that will be impelemented later on. The AHB-Lite Bus will allow the processor to communicate with the peripherals.

    In order to implement the AHB-Lite Bus, the two modules need to be created. The address decoder is a many-to-one design which means there will be more inputs than outputs. The output for the decoder is a select signal. This is how the processor selects which peripheral it wants to control and send/write data to.

    To implement the address decoder, a case statement is needed that has the input of the 8 MSB from the address signal (HADDR) and selects an output signal that is connected to the peripherals. This output signal lets the peripherals know that it has been selected and the processor will either write or read data based on the control signal. A reg, dec, is assigned to the output select signals, HSEL_X. The case statement is selecting the dec bit and connects that to the corresponding HSEL_X. Each HSEL_X will have its own dec bit. For example, HSEL_0 is assigned to dec[0]. There is also an ouptut signal for the subordinate multiplexor with the same logic as the select signal applied. The memory map for the Cortex-M0 is necessary in order to impelement the address decoder properly since the case statement depends on the HADDR signal. The peripherals have a memory space of 0x4000_0000 to 0x5FFF_FFFF. For simplicity, we have only used 0x5000_0000 to 0x5FFF_FFFF. Also, 0x0000_0000 to 0x1FFF_FFFF is for the BRAM which will store the program code later discussed. Each subordinate or peripheral will have 16MB of space. Therefore, only the 8 MSB change for each subordinate.

    The subordinate multiplexor has the mux select output from the address decoder as an input and the data and response signals from the subordinates as inputs as well. The multiplexor will choose which data from the subordinate to send to the processor. The response signal is to indicate whether or not all the data has been sent or no data available. The impelementation of the multiplexor is similar to the decoder but instead the mux select signal is the input to the case statement. The mulitplexor is a many-to-one. Therefore, the design will have many inputs but only one output which will be the data sent to the processor. The input to the design source will be the data from the subordinates which is the peripherals in this project.

    The two modules, address decoder and subordinate mulitplexor are now implemented. These should be instantiatied in the top module for the AHB-Lite bus which will have the Cortex-M0 IP cores instantiated as well to connect all the design sources.

    *See Github page(click project name) for design source reference


    From the Cortex-M0 technical reference manual, the memory map shows how the memory is organized for the processor. Figure 3 below is a visual of that memory map. Each memory space has its specific purpose. For this project, we are focusing on the executeable memory which is Code, SRAM, and peripheral regions.


    The SRAM for this project will be implemented as BRAM, block RAM. This is also where the program code will be stored. Using the $readmemh function, the processor can be programmed with a code.hex file which is generated in Keil, later discussed. The memory is byte addressable which means each memory address points to a byte of data. There input will be the write data from the subordinates and the output will be the read data which is can be accessed by the processor or subordinates. The design source can be referenced in the github.


    Each peripheral will have there own memory space to store the data for there registers such as configuration and data.


    VGA, Video Graphics Array, is a display controller which is still widely used today but slowly being replaced by HDMI and other display controllers. This is how the game will be displayed on to the monitor. However, there are a few concepts that need to be understood before the VGA peripheral can be implemented.

    First, the VGA connector has 5 analog components: blue, red, green, horizontal and vertical synchronization. The blue, red, and green is to display the colors. The horizontal and vertical synchronization ensure that the output is being displayed properly, meaning no stutter or frame skipping.

    The horizontal synchronization resets the start of the next line. The vertical syncrhonization is to start the next frame. This ishow the VGA signal works which is also called raster scanning. The monitor will adjust the scanning frequency and screen resolution dpending on the synchronization from the SoC.

    Timing is very important for the VGA peripheral since the lines and frames need to be reset at a certain frequency that the human eye can see also known as refresh rate. This can't be too fast since the human eye can only detect up to a certain frequency. The values used in this lab is shown in the figure below which also provides a visual of the waveform for the synchronization signals. This is specific for a 25Mhz clock frequency and 60Hz refresh rate.

    Figure X: VGA Synchronziation Waveform and Values

    The display region will be 640x480 pixels. There will be a vertical and horizontal porch, front and back. This is for the synchronization to know when to start a new frame or start the next line. Therefore, in total the screen is 800x600 pixels. The synchronization signals will be activated with a zero pulse. A visual of the display is shown in the figure below.

    Figure X: VGA Display Synchronziation

    Now the VGA peripheral can be implemented once there is an understanding of how the VGA signals work. Additionally, the display region will be split into 2 regions, text and image. The text region displays text and the image region can display images, figures, symbols and etc. There are a few components needed to implement the VGA peripehral which is the image buffer and text console. There is also a mux that chooses where the data is going to displayed either text and image region. There is also the VGA interface which generates the synchronization signals to the VGA port which the connector connects to. The VGA interface also outputs the address of the current pixel. The block diagram for the VGA peripehral is shown below.

    Figure X: VGA Block Diagram

    The image buffer stores the color data for all the pixels in the image region. This will also include the dual-port memory. Since the on-chip SRAM has limited memory, the resolution needs to be reduced by mapping mulitple pixels to one single data. For example, a 4x4 pixel region can be stored in one single data in the image buffer.

    The texts console displays texts to the text region. This will also include a scroll signal that allows the text characters to be displayed one after the other. This is implemented on hardware logic since there is limited on-chip SRAM. The logic used is ROM which is short for read-only memory. This will store the ASCII values and corresponding data that enables the specific pixels on the text region to display the desired character depending on the ASCII value.

    The VGA interface will divide the clock frequency of the processor which is 100Mhz to 25Mhz. This will also generate the horizontal and vertical synchronization signals using the divided clock frequency with a counter. The counter will count up to the necessary values that will meet the synchronization timing values. Shown in figure ****


    UART, Universial Asynchronous Reciever/Transmitter, is a serial communication protocol that will allow us to receive and transmit data. Serial communiation means each bit of data it sent one after the other, sequentially. There are two main signals which is the transmit (TX) and receive (RX). The data is converted from parallel to serial then transmits. Otherwise, the sequential data is received and reassembled to parallel. Since there is no clock signal required, the sender and receiver need to pre-agree on timing to ensure the data is sent is synchronized. This is called baud rate which has a unit of byte per sec (bps).

    UART protocol has a start bit which starts the data transfer then the bits of data is sent. In this project, 8-bits of data is sent sequentially then there is a stop bit. The start bit is initilzaed with a logic low and the stop bit is a logic high. In this project, UART is used to receive characters from our keyboard using a terminal such as TeraTerm. This is how the processor will get data from our keyboard.

    Figure X: UART Block Diagram

    To implement the UART peripheral, there are a few blocks that need to be implemented such as the transmitter, receiver, baud rate generator and FIFO, also known as First In First Out.

    The baud rate generator will output system ticks to the transmitter and receiver. This can be implemented with a clock divider and counter that outputs a logic high system tick to indicate it is time to transfer or read data at a specific rate which is the baurd rate. The baurd rate for this project is 19200 bps.

    The UART transmitter will read data from the transmitter FIFO and transmit it depending on the ticks from the baud rate generator. This will also convert the single byte data to sequential bits. This module is impelemented using a state machine with four states: idle, start, data, and stop. The idle is just waiting for the start bit which is a logic low. Then the start state will move to the data state which converts the byte data to sequential bits by logic shifting right 1 bit at every tick. Once it has converted the byte data into 8 sequential bits it will go to the stop state which goes back to the idle state. It will transmit those sequential bits to the TX pin on the FPGA board.

    The UART receiver will receive the 8-bits of sequential data and reassemble to a single byte of data using the baurd rate generator ticks. Then it will write the byte of data to the receiver FIFO. This will be a similar impelemntation as the transmitter with the same states but instead the data state will be reassembling the bits into a byte of data in a reg. This can be done by storing each bit one after the other using the concatentation operators. Refer to github design source.

    Lastly, the FIFO for both the transmitter and receiver. This only needs to created once but instantiated twice. FIFO is first in first out and is known as a buffer. Since the processor and UART are operating at different frequencies, the transmitter FIFO will give more time for the UART to process the data from the processor. This improves the system efficiency since the processor doesn't need to wait for the UART. For the receiver, it allows more time for the processor to handle interrutps while it is still receiving data at a slower frequency than the processor. The implementation for the FIFO involves write and read data, empty and full signals. There are two states for the state machine. Not empty and not full. When it is not empty, it will let the UART know that there is data to be transmitted and sends the data to the UART transmitter to transmit to the TX pin. Vice versa for not full, this lets the processor or UART know that there is no data to be sent or received. This uses dual-port memory logic which has one port for reading and other for writing.

    The same FIFO design source should be used for the transmit and receive FIFO. As it can be instantiaed twice, with the input and outputs specific to the transmitter and receiver.


    Hardware Timer is implemented for this project for the timer countdown for each player. The hardware timer is a digital counter that coutns regular events and references a clock source that has a high frequency and fixed. The clock source used for this timer is the master clock which is 100Mhz. The timer will decrement or increment at a fixed frequency which is that clock frequency. The timer will reset once it has reached zero or a pre-defined value, known as the Load value. The block diagram is shown in the Figure below which displays the block designs for the impelementation.

    The timer will be used to provide the amount of time each player has by incrementing the corresponding player's timer when the timer peripehral interrupt has occured.

    Figrue X: Timer Block Diagram

    The block designs that need to be implemented are the prescaler and 32-bit counter. There is also the registers for the load, current, and control. These registers store the information needed to configure and start the timer.

    The pre-scaler is a frequency divider that uses the master clock and outputs a frequency the timer uses. This is how the speed of the timer is changed.

    The 32-bit counter is a counter that counts at the frequency from the prescaler and uses the value from the load register to count up or down from that value.

    The control register is the register to configure the timer which is how the prescaler can be configured as well. This is also where we can start and stop the timer. The load register is the reigster that stores the load value which is the valuet that the counter will count up to or down from. The current register stores the current value of the counter. The control reigster is where the enable interrupt bit is stored. The interrupt is automatically cleared once the interrupt has been triggered.


    The 7-segment peripheral is very popular and simple to implement. There are 7 led segments that are active-low and a specific combination of led segments are activated in order to display a number, character or symbol. The 7-segment peripehral used in this project has 4 digits. All the LED segments are connected to a common anode. Each of the digits need to be turned on at a specific frequency for the human eye to see. Therefore, a frequency divider is implemented.

    Figrue X: 7-segment Block Diagram

    There are two decoders for the 7-segments and the Anode. The 7-segments determine which LEDs are turned on for each symbol. The Anode will enable each digit to allow the symbols to be displayed. Each of the anodes will be turned on at a certain frequency for the human eye to see that they are all on at the same time.

    There will be 4 registers, one for each digit, which is how the user can configure what number or symbol to display at which digit.

    Hardware Interface


    Now that all the hardware has been implemented. We need to able to communicate between the programmer and the hardware. This is why we create drivers. This will allow us to control the hardware using software which is essential to programming the Tic Tac Toe game with the hardware we just implemented.

    Since we implemented the hardware, we have the advantage of understanding the memory map and using the registers that we created to configure the peripherals that we need. Instead of having to write directly to the reigsters, we have created a header and c file that contains functions that allow us to write to the registers using these functions. This is known as device drivers.

    There are also drivers that have been provided by CMSIS, Cortex Microcontroller Software interface Standard. The drivers provide functions that allow access to the NVIC, system control block and system tick timer. This will allow us to simplify and reuse code for different applications. For example, the same drivers were also used to create a Snake game.

    Table X: Memory Space for Peripherals

    The functions in the driver include plotting pixels with the VGA peirpheral, writing to 7-segment display, initalizing, enabling and disabling the timer peripheral and reading and writing to the GPIOs. The function declarations and their corresponding parameters are shown in the screenshot below in Figure X. This simplifies our code and also increases the portability as it can be applied to different applications. The driver functions will help us create the API for our Tic Tac Toe game as well.

    Table X: Device Driver Functions


    Interrupts are very important for this application since it increases efficiency and decreases power consumption. Interrupts, in short, are triggered externally and allows code to run only when it has been triggered. This is code that is ran when an interrupt has occurred is called the interrupt service routine(ISR). Each interrupt has a certain priority which the NVIC is responsible for allocating who has the highest priority.

    The peripherals that use an interrupt in this application will be the UART and timer. Since we want the fastest response from our keyboard, the UART needs to have an interrupt to run a service routine that changes the cursor position and also allows the player to place their symbol. For the timer, we want each player to have a time limit and provide a winner and stop play when a player has reached the time limit. This will be the code for the timer ISR.

    For the priority of the interrupt, we set the timer interrupt as the highest priority since the player that runs out of time shouldn't be allowed to make a last second key hit and avoid the instance where a key hit is valid even when the timer has already reached zero.

    With the use of interrupts, we are able to conserve more power with the SoC sleep mode enabled. This means whenever the SoC is no longer handling an interrupt it will decrease the power by lowering the clock frequency and allowing the wakeup interrupt contrller to notify the SoC that an interrupt has occured.

    Game Design

    With the rules of the game being simple, creating an API which is an Application Program Interface allows us to simplify and accelerate the devlopment of our game. The API we have created includes functions and constants that are constantly used or consist of many lines of code. Therefore, the readability of our main code has increased as well due to our API.


    Table X: API Functions for Tic Tac Toe Game

    The API functions that we have created and used are shown in the Table X above. The drawX and drawCircle functions are created so that the players symbols will be displayed on the screen whenever they hit the enter key.

    The controls that me and my partner decided to use is the "WASD" keys to control the cursor from one grid position to another and the spacebar as the enter key which will display the players corresponding symbol where the cursor is at.

    There are two key paramters that need to be saved and checked whenever the UART interrupt occurs which is when a key has been hit. The first is the position of the cursor and also the number and position of the players symbols. The winner is checked whenever the enter key has been pressed. Once there is a winner, the game will end with a message stating who won the game and no key presses besides reset which is the r key or quit which is the q key will be allowed.

    For the game interface, we need a 3x3 grid and two symbols, X and O to be displayed in the grid boxes. Therefore, creating one function for each that can be reused for different applications is necessary. This is why we have the grid and draw functions as shown in the table above. The cursor is the drawHighlight function which will display a box around the current grid box selected.

    The UartGetC and VGAPutC is how the the characters are received and displayed on the text region. We communicate using UART with TeraTerm which the getC function is used for to be able to play the game using the keyboard.

    The game winning combinations can be checked using nested if else statements and checking the gameboard matrix whether or not a game winning combination has already been made after a player presses enter.

    At the end of the game, the text region will display the result of the game, tie or whichever player won. It will also allow you to reset and quit the game. Please see the video at the top of the page for demonstration.

    Testing & Techncial Challenges

    The most difficult part is implementing the hardware as it involves many design modules that conists of complexed digital logic design. However, with the block diagrams and understanding what the purpose of each block it helps to lead you into the right direction. Understanding the hardware implemented is key especially for the firmware such as the drivers since we are creating interfacing between the hardware and software.

    Debugging our game design was also another challenge. However, working with a partner helped and sped up the process since we we're both able to collaborate and peer-review each others code and logic. Our process for debugging was to run the program and address the first bug that we encounter and debug that first. This way if one bug is the reason for another we can get rid of multiple by debugging one. Reading out the code to each other and explaining our logic was also part of this process that accelerated the debugging process.