SYNAPTICADcolon TECHNICAL PAPERS

Coding Techniques for Bus Functional Models In Verilog, VHDL, and C++

By Ben Rhodes and Dan Notestein, SynaptiCAD

Bus functional models are simplified simulation models that accurately reflect the I/O level behavior of a device without modeling its internal computational abilities. For example, a bus functional model of a microprocessor would be able to generate PCI read and write transactions to a PCI device model to initialize and test the PCI device's functionality, but the microprocessor BFM would not be capable of reading CPU instructions from a memory and properly executing the instructions (this would require a complete behavioral level model of the processor). Bus functional models are commonly used in test benches to stimulate design models and verify their functionality. For the purposes of this paper, the designs models being tested are either RTL or gate-level models of the system.

Using transactors to model transaction signaling protocols

Bus functional models typically serve as an abstraction layer between the transaction level of system functionality which describes what data is being exchanged between two devices and the signaling level which dictates how this data is exchanged during the transaction. At the transactional level, a transaction can be viewed as a simple function call with parameters for the data being exchanged. At the signaling level, this is converted into signal transitions on appropriate clock cycles along with handshaking logic to ensure the data exchange is properly synchronized. The part of the BFM that performs the signaling when a transaction function is called is known a transactor. Transactors are the only parts of a BFM that interact directly with the signals of a design model; the remaining code in a BFM manipulates only transaction level data. The figure below demonstrates how transactors serve as an interface between the transaction level code in the testbench and the signals of the design model being tested.

Figure 1: Transactors are driven by the Transaction manager and stimulate the MUT

For best simulation performance, transactors should generally be modeled in the language of the design under test since there is typically a performance penalty for simulation activity that occurs across simulation language barriers, whereas it is often convenient to model the transaction level part of the BFM in a language that directly supports data structures and dynamic memory allocation. There is usually little if any penalty in writing the transaction level code in a higher level language since data is being worked with in larger chunks that doesn't need to interact as much with the simulation kernel.

Master and slave transactors

Transactors can be divided into two broad categories: master transactors that initiate a transaction and slave transactors that respond to a master transactor. Master transactors are generally modeled as procedures that are called whenever a transaction should be started, whereas slave transactors tend to be modeled as a group of related parallel processes that run for the entire simulation run, responding whenever they recognize a transaction is addressed to them.

Although it is convenient from the point of view of the code that initiates master transactions to model master transactors as a procedure, the underlying implementation of a master transaction may also require the use of multiple parallel processes, which neither VHDL nor Verilog allow in functions. This problem can be overcome by modeling the master transactor as a state machine that responds to handshaking signals triggered by an "ApplyTransaction" procedure, making the master transactor look like a procedure call to the transaction-level code of the BFM. By default, this creates a transactor that does not block the calling process, but blocking transactions can be achieved by calling a version of the "ApplyTransaction" procedure call that waits for a completion signal from the transactor.

It is frequently necessary to model a transaction as a set of cooperating processes, but this leads to two problems: (1) the processes must be synchronized so that they start and stop together and (2) it is easy to introduce races between when signals are sampled and driven. In Verilog, synchronization of the processes can be achieved using a fork-join to coordinate the processes. In VHDL, a pseudo fork-join can be used to simulate this effect. This technique uses a resolved handshaking signal that is monitored and driven by all the processes to be forked (see Writing Testbenches, Janick Bergeron, pp 135-137 for a detailed explanation of this technique).

It is often desirable to be able to restart these processes during the middle of a transaction, effectively reseting the transaction. In Verilog, this can be done using disable statements, in VHDL it is more awkward, as it requires an abort status signal to be checked every time a wait statement is encountered in the transaction processes. By adding an additional state to the handshaking signal that handles the pseudo fork-join, we can reuse this signal as the abort status signal. This technique allows any of the processes in the pseudo fork-join to abort the transaction.

Avoiding race conditions in transactor sampling code

Race conditions can arise in a transactor when you need to sample the value of a signal and drive other signals that could affect the value of the sampled signal. Generally this can be avoid by sampling the value prior to driving the other signals, but when multiple processes are involved the order in which these statements occur is no longer known. This can be avoided in simple cases by the use of non-blocking statements in Verilog (in VHDL, this is the default case as long as you're not using shared variables).

However, if one of the processes enables the execution of another process through zero delta time handshaking signals, these extra delta times can still lead to race conditions. This kind of code often occurs when a condition in the first process enables the execution of the second process, for example, when a signal's stability needs to be checked after a particular clock edge. This kind of state sampling code can often be in-lined in the enabling process, but this is not possible in cases where the stability checking code includes wait statements that would block the execution of the enabling process. To solve this problem, the following method can be used:

  1. Place the sampling code in a separate process that waits on a triggering event from the initiating process.
  2. If the sampling process needs to sample at the same clock edge as the triggering clock, then the initiating process needs to store off the initial value of the signal to be sampled.
  3. To start the sampling process, use event triggers "->" in Verilog or toggle a std_logic signal in VHDL. Using this technique, you can trigger multiple sampling processes from the initiating process without introducing delta cycles in the initiating process.

Data structures and data packing for serializing of packet data

Data structures are useful for modeling complex data at a high level of abstraction. This can be very helpful when passing data between modules and tasks since multiple pieces of data can be passed as a single logical unit. Classes are even more useful since tasks and functions can be associated with each data structure for encapsulating algorithms specific to the type of data structure, such as packing and randomization.

Classes form the base of C++, but aren't available in VHDL and Verilog. However, you can create pseudo-classes in these HDL languages. In Verilog, you would create a module with regs, tasks and functions to represent a class. Two tasks need to be defined to convert the class to/from an array of bits in order to pass instance information across module and task boundaries (this is very similar to the concept of using $realtobits and $bitstoreal to pass real numbers across module boundaries). In VHDL, you can create a record to represent the data structure, usually placed in a package. For each class method, the first parameter should be an inout of the data structure record type to allow the method to operate on the internals of a particular data structure instance. A Verilog example is shown below.

module packet_type;
  reg [23:0] tb_packed_bits;
  reg [7:0] FIELD0;
  reg [7:0] FIELD1;

  function [23:0] tobits;
     input dummy;
  begin
     tb_packed_bits = { FIELD1, FIELD0 };
     tobits = tb_packed_bits;
  end
  endfunction

  task frombits;
     input [23:0] tb_packed_bits_in;
  begin
    tb_packed_bits = tb_packed_bits_in;
    { FIELD1, FIELD0 } = tb_packed_bits;
  end
  endtask
endmodule

Data packing is necessary when you need to translate data structures into information that can be understood by a bus protocol being used. It is very convenient to pass high level data structures around when working with a test bench, but usually at some point these data structures need to be transmitted across an actual bus in the hardware models. A nice way to do this is to create a class method that can be used to convert the data structure into either an array of bits or bytes (depending on the bus protocol). In Verilog, this could even be the same method that was written to pass the class across module and task boundaries, as described above. Below is an example of how to do this in VHDL:

type CLASS0 is record
  FIELD0 : bit_vector(7 downto 0);
  FIELD1 : bit_vector(7 downto 0);
end record;

function pack(this : CLASS0) return std_logic_vector is
  variable packed_data : std_logic_vector(15 downto 0);
begin
  packed_data(7 downto 0) := To_StdLogicVector(this.FIELD0);
  packed_data(15 downto 8) := To_StdLogicVector(this.FIELD1);
  return packed_data;
end function;

function unpack(packed_data : std_logic_vector(15 downto 0)) 
                     return CLASS0 is
variable dataStructure : CLASS0;
begin
  dataStructure.FIELD0 := To_bitvector(packed_data(7 downto 0));
  dataStructure.FIELD1 := To_bitvector(packed_data(15 downto 8));
  return dataStructure;
end function;

VHDL and Verilog do have some limitations when using these pseudo-class techniques. In Verilog, to pass a class instance into a module, it must first be converted into a bit array. Then, inside the task it must be converted back into a module instance. This means an additional module instance must be created that is available from the scope of the task that can be used to convert the bit array that was passed in to a data structure. Also, Verilog and VHDL pseudo-class solutions lack more advanced features available in C++ classes such as data hiding, inheritance, and polymorphism.

Developing transaction generators and managers to stimulate a design

Once transactors have been created for a BFM, a transaction generator must be created that can generate the different types of transaction calls and the inputs for the transaction calls. The transactions are typically a mix of directed tests used to setup and test specific functionality combined with long runs of randomly generated transactions to catch any problem cases not caught by the directed tests.

Constrained random testing is used when a system has too many potential input sequences to test all possible input sequences (a typical situation for virtually all system level designs) because they save time compared to manually writing the huge number of directed tests that would otherwise be required. The term constrained random is used to refer to randomly generated transactions that are constrained by the generator to meet some requirements on the randomly generated values. Typically the constraints are that the parameters to the transaction are logically consistent with one another and with respect to the transaction protocol and the implementation of the design under test. For example, the address values to a read transaction might be constrained so that most of them are within the address space of the device under test. By constraining the parameters in this fashion, fewer transaction test vectors need to be generated to test the system, reducing the overall run time of the test bench.

Using hierarchical references to transactors

When generating master transactor calls to test your design, it is frequently useful to be able call transactors that are located in different BFM instantiations. For example, a higher level BFM may contain several ATM port BFMs with SendPacket transactors that need to be initiated from the higher level BFM. This requires that the transactors be hierarchically addressable from the higher level BFM. Hierarchical referencing of transactors is supported natively in Verilog and easily done in C++, but it is not natively supported in VHDL. Below is a technique that can be used to emulate hierarchical referencing in VHDL. Although this technique is discussed for the purpose of supporting hierarchical function calls to transactors, it can also be applied whenever a testbench requires hierarchical access to components of the design.

The basic idea behind hierarchically accessible transactors is to create a global array of control signals, one for each transactor instance. As each transactor initializes itself, it registers itself with a hash table that maps from the transactor instance hierarchical name to the appropriate index into the control signal array. Additional arrays are also needed to store the parameters for each type of transactor. Generics can be used to pass down through the hierarchy the instance name strings to each transactor instance. The figures below show the flow of control for the transactor and the Apply function that initiates a transaction on the transactor:

Using a transaction manager queue to mix transaction streams

For simple test sequences, you can execute a series of transactors from a single process, one after the other. If you want multiple transactors to execute at the same time, then you can use non-blocking calls to the transactors. But, if you want to have multiple sequences of transactions running in parallel, then you must develop a more involved transaction sequencer.

One solution is to create a process for each sequence of transactions that you want to run in parallel. But, this is limiting in situations where you need to have control over all the types of transactions to run in one process. For example, in order to fully exercise an ATM switch, you need to send ATM cells to each input port simultaneously. Also, randomly determining the port number and ATM cell data to transmit can enhance the test bench. So, it would be nice to be able to generate X number of cells to send and transmit them to the switch through random port numbers. And while doing this, not allowing one particular transmitter to block another. So, a second solution is to create a transaction manager that reads transactions from a queue and executes them one after the other. You could have one transaction manager instance per port and place transactor calls randomly into their queues. In Verilog, this is difficult to do and beyond the scope of this paper so we are just going to cover how to implement this solution in VHDL and C++.

In VHDL, you can implement a transaction manager by using the "hierarchical referencing" technique above and by creating the following: 1) an additional record type, TApplyCall that stores a Transactor Node and the transactor's parameters, 2) a queue of TApplyCall's, 3) functions that can be used to place TApplyCall's on the queue, and 4) a process that will read TApplyCall's from the queue and use them to execute a transactor.

The transactor parameters can be represented using a "line" in VHDL so that TApplyCall can be used for all types of transactors. Then, you would add a data member to the Transactor Node that represents the type of the transactor that the transactor manager can switch on to determine what method to call to run the transactor. That method would be responsible for extracting the appropriate parameters from the parameters "line" and executing the transactor using the control signal index as described in the " hierarchical referencing" section.

In C++, a class can be written to represent the transaction manager. This class would read transactors from a queue and call a virtual method, Execute, to run the transactor. So, there would be a base class that all transactor classes derive from and each transactor class would have it's own data member to represent the parameters to use for a particular transaction. Each transactor class would be responsible for actually performing a particular bus transaction when the Execute method is called (i.e. by using TestBuilder, SCV, or PLI). For each transactor that you want to place in the queue, you would create a new instance of the transactor class, set up its parameters data member and push it onto the queue.

Using a golden reference model to verify design output in the face of randomized input

A golden reference model is an unclocked, behavioral model of the system design that can be used to verify the output of a low level model (either RTL or gate level). The golden reference model must model both the design under test and the functionality of the surrounding BFMs. The same transactions are applied to both the lower level model under test and the golden reference model and the outputs of the two models are compared to ensure that the lower level model is functioning properly. By using a golden model, a verification engineer can avoid having to manually determine the expected results of his directed tests. Further, the use of a golden reference model is virtually required when performing constrained random tests as it would take too long to manually determine expected results for a large number of randomly generated transactions. The figure below shows a typical structure for a testbench that uses a golden reference model to verify the output from the design model.

When written in C++, golden reference models usually consist of several classes, one for each type of device in the system. Each class contains functions for each type of transaction that the device participates in. These functions take their inputs and compute the appropriate outputs in zero simulation time since the functions are all untimed behavioral code. The code for the golden reference model is also much simpler than the code for the RTL-level model as it doesn't need to account for low level protocol details such as when data becomes available during a transaction or handshaking requirements of a transaction.

The outputs from the golden reference model can be generated before, during, or after the testing of the design under test. There is one advantage to running the golden reference model and the simulation model in parallel: the randomization of the transactions and transaction data can be modified at runtime according to coverage requirements of the test bench. However, this approach does require that the output values for both models be available at the same time during the test bench so that the values can be compared. This can be achieved by calling the appropriate golden reference model function at the end of the execution of a transactor when the results from the lower level model become available. Since the golden reference model is an untimed model, its outputs are available immediately after the function call is made and the results of the two models can be compared.

Conclusion

Transaction-based BFMs enable very robust, reusable testbenches to be created, but some problems occur when writing these type of testbenches due to limitations in VHDL and Verilog. In this paper, we have examined several coding techniques for overcoming these problems as well as some ways to overcome them using a combination of C++ and Verilog or VHDL. SynaptiCAD makes a graphical bus-functional model generator called TestBencher Pro that will generate the code described in this paper.


Daniel Notestein, co-founder of SynaptiCAD, is the chief architect for SynaptiCAD's WaveFormer Pro and VeriLogger Pro products. Notestein obtained his bachelor's degree in electrical engineering and minors in computer science and math from Virginia Tech and his MSEE from the University of Texas.

Ben Rhodes is the project leader for SynaptiCAD's TestBencher Pro product. His areas of special expertise include VHDL, Verilog, SystemC, OpenVera, and e test bench coding. Rhodes obtained his BSEE from Virginia Tech.

Back to Technical Papers page