| Coding Techniques for Bus Functional Models In Verilog, VHDL, and C++
                By Ben Rhodes and Dan Notestein, SynaptiCAD
               
                Bus functional models are simplified simulation models that accurately reflect the I/O level behavior
                of a device without modeling its internal computational abilities. For example, a bus functional model
                of a microprocessor would be able to generate PCI read and write transactions to a PCI device model
                to initialize and test the PCI device's functionality, but the microprocessor BFM would not be capable
                of reading CPU instructions from a memory and properly executing the instructions (this would require
                a complete behavioral level model of the processor). Bus functional models are commonly used in test
                benches to stimulate design models and verify their functionality. For the purposes of this paper, the
                designs models being tested are either RTL or gate-level models of the system.
               
                Using transactors to model transaction signaling protocols
                Bus functional models typically serve as an abstraction layer between the transaction level of system
                functionality which describes what data is being exchanged between two devices and the signaling level
                which dictates how this data is exchanged during the transaction. At the transactional level, a transaction
                can be viewed as a simple function call with parameters for the data being exchanged. At the signaling
                level, this is converted into signal transitions on appropriate clock cycles along with handshaking
                logic to ensure the data exchange is properly synchronized. The part of the BFM that performs the signaling
                when a transaction function is called is known a transactor. Transactors are the only parts of a BFM
                that interact directly with the signals of a design model; the remaining code in a BFM manipulates only
                transaction level data. The figure below demonstrates how transactors serve as an interface between
                the transaction level code in the testbench and the signals of the design model being tested.
               
                 
                Figure 1: Transactors are driven by the Transaction manager and stimulate the MUT 
                For best simulation performance, transactors should generally be modeled in the language of the design
                under test since there is typically a performance penalty for simulation activity that occurs across
                simulation language barriers, whereas it is often convenient to model the transaction level part of
                the BFM in a language that directly supports data structures and dynamic memory allocation. There is
                usually little if any penalty in writing the transaction level code in a higher level language since
                data is being worked with in larger chunks that doesn't need to interact as much with the simulation
                kernel.
               
                Master and slave transactors
                Transactors can be divided into two broad categories: master transactors that initiate a transaction
                and slave transactors that respond to a master transactor. Master transactors are generally modeled
                as procedures that are called whenever a transaction should be started, whereas slave transactors tend
                to be modeled as a group of related parallel processes that run for the entire simulation run, responding
                whenever they recognize a transaction is addressed to them.
               
                Although it is convenient from the point of view of the code that initiates master transactions to model
                master transactors as a procedure, the underlying implementation of a master transaction may also require
                the use of multiple parallel processes, which neither VHDL nor Verilog allow in functions. This problem
                can be overcome by modeling the master transactor as a state machine that responds to handshaking signals
                triggered by an "ApplyTransaction" procedure, making the master transactor look like a procedure call
                to the transaction-level code of the BFM. By default, this creates a transactor that does not block
                the calling process, but blocking transactions can be achieved by calling a version of the "ApplyTransaction"
                procedure call that waits for a completion signal from the transactor.
               
                It is frequently necessary to model a transaction as a set of cooperating processes, but this leads
                to two problems: (1) the processes must be synchronized so that they start and stop together and (2)
                it is easy to introduce races between when signals are sampled and driven. In Verilog, synchronization
                of the processes can be achieved using a fork-join to coordinate the processes. In VHDL, a pseudo fork-join
                can be used to simulate this effect. This technique uses a resolved handshaking signal that is monitored
                and driven by all the processes to be forked (see Writing Testbenches, Janick Bergeron, pp 135-137 for
                a detailed explanation of this technique).
               
                It is often desirable to be able to restart these processes during the middle of a transaction, effectively
                reseting the transaction. In Verilog, this can be done using disable statements, in VHDL it is more
                awkward, as it requires an abort status signal to be checked every time a wait statement is encountered
                in the transaction processes. By adding an additional state to the handshaking signal that handles the
                pseudo fork-join, we can reuse this signal as the abort status signal. This technique allows any of
                the processes in the pseudo fork-join to abort the transaction.
               
                Avoiding race conditions in transactor sampling code
                Race conditions can arise in a transactor when you need to sample the value of a signal and drive other
                signals that could affect the value of the sampled signal. Generally this can be avoid by sampling the
                value prior to driving the other signals, but when multiple processes are involved the order in which
                these statements occur is no longer known. This can be avoided in simple cases by the use of non-blocking
                statements in Verilog (in VHDL, this is the default case as long as you're not using shared variables).
               
                However, if one of the processes enables the execution of another process through zero delta time handshaking
                signals, these extra delta times can still lead to race conditions. This kind of code often occurs when
                a condition in the first process enables the execution of the second process, for example, when a signal's
                stability needs to be checked after a particular clock edge. This kind of state sampling code can often
                be in-lined in the enabling process, but this is not possible in cases where the stability checking
                code includes wait statements that would block the execution of the enabling process. To solve this
                problem, the following method can be used:
               
                Place the sampling code in a separate process that waits on a triggering event from the
                  initiating process.If the sampling process needs to sample at the same clock edge as the triggering clock,
                  then the initiating process needs to store off the initial value of the signal to be sampled.To start the sampling process, use event triggers "->" in Verilog or toggle a std_logic
                  signal in VHDL. Using this technique, you can trigger multiple sampling processes from the initiating
                  process without introducing delta cycles in the initiating process. 
                Data structures and data packing for serializing of packet data
                Data structures are useful for modeling complex data at a high level of abstraction. This can be very
                helpful when passing data between modules and tasks since multiple pieces of data can be passed as a
                single logical unit. Classes are even more useful since tasks and functions can be associated with each
                data structure for encapsulating algorithms specific to the type of data structure, such as packing
                and randomization.
               
                Classes form the base of C++, but aren't available in VHDL and Verilog. However, you can create pseudo-classes
                in these HDL languages. In Verilog, you would create a module with regs, tasks and functions to represent
                a class. Two tasks need to be defined to convert the class to/from an array of bits in order to pass
                instance information across module and task boundaries (this is very similar to the concept of using
                $realtobits and $bitstoreal to pass real numbers across module boundaries). In VHDL, you can create
                a record to represent the data structure, usually placed in a package. For each class method, the first
                parameter should be an inout of the data structure record type to allow the method to operate on the
                internals of a particular data structure instance. A Verilog example is shown below.
               
module packet_type;
  reg [23:0] tb_packed_bits;
  reg [7:0] FIELD0;
  reg [7:0] FIELD1;
  function [23:0] tobits;
     input dummy;
  begin
     tb_packed_bits = { FIELD1, FIELD0 };
     tobits = tb_packed_bits;
  end
  endfunction
  task frombits;
     input [23:0] tb_packed_bits_in;
  begin
    tb_packed_bits = tb_packed_bits_in;
    { FIELD1, FIELD0 } = tb_packed_bits;
  end
  endtask
endmodule
                Data packing is necessary when you need to translate data structures into information that can be understood
                by a bus protocol being used. It is very convenient to pass high level data structures around when working
                with a test bench, but usually at some point these data structures need to be transmitted across an
                actual bus in the hardware models. A nice way to do this is to create a class method that can be used
                to convert the data structure into either an array of bits or bytes (depending on the bus protocol).
                In Verilog, this could even be the same method that was written to pass the class across module and
                task boundaries, as described above. Below is an example of how to do this in VHDL:
               
type CLASS0 is record
  FIELD0 : bit_vector(7 downto 0);
  FIELD1 : bit_vector(7 downto 0);
end record;
function pack(this : CLASS0) return std_logic_vector is
  variable packed_data : std_logic_vector(15 downto 0);
begin
  packed_data(7 downto 0) := To_StdLogicVector(this.FIELD0);
  packed_data(15 downto 8) := To_StdLogicVector(this.FIELD1);
  return packed_data;
end function;
function unpack(packed_data : std_logic_vector(15 downto 0)) 
                     return CLASS0 is
variable dataStructure : CLASS0;
begin
  dataStructure.FIELD0 := To_bitvector(packed_data(7 downto 0));
  dataStructure.FIELD1 := To_bitvector(packed_data(15 downto 8));
  return dataStructure;
end function;
                VHDL and Verilog do have some limitations when using these pseudo-class techniques. In Verilog, to pass
                a class instance into a module, it must first be converted into a bit array. Then, inside the task it
                must be converted back into a module instance. This means an additional module instance must be created
                that is available from the scope of the task that can be used to convert the bit array that was passed
                in to a data structure. Also, Verilog and VHDL pseudo-class solutions lack more advanced features available
                in C++ classes such as data hiding, inheritance, and polymorphism.
               
                Developing transaction generators and managers to stimulate a design
                Once transactors have been created for a BFM, a transaction generator must be created that can generate
                the different types of transaction calls and the inputs for the transaction calls. The transactions
                are typically a mix of directed tests used to setup and test specific functionality combined with long
                runs of randomly generated transactions to catch any problem cases not caught by the directed tests.
               
                Constrained random testing is used when a system has too many potential input sequences to test all
                possible input sequences (a typical situation for virtually all system level designs) because they save
                time compared to manually writing the huge number of directed tests that would otherwise be required.
                The term constrained random is used to refer to randomly generated transactions that are constrained
                by the generator to meet some requirements on the randomly generated values. Typically the constraints
                are that the parameters to the transaction are logically consistent with one another and with respect
                to the transaction protocol and the implementation of the design under test. For example, the address
                values to a read transaction might be constrained so that most of them are within the address space
                of the device under test. By constraining the parameters in this fashion, fewer transaction test vectors
                need to be generated to test the system, reducing the overall run time of the test bench.
               
                Using hierarchical references to transactors
                When generating master transactor calls to test your design, it is frequently useful to be able call
                transactors that are located in different BFM instantiations. For example, a higher level BFM may contain
                several ATM port BFMs with SendPacket transactors that need to be initiated from the higher level BFM.
                This requires that the transactors be hierarchically addressable from the higher level BFM. Hierarchical
                referencing of transactors is supported natively in Verilog and easily done in C++, but it is not natively
                supported in VHDL. Below is a technique that can be used to emulate hierarchical referencing in VHDL.
                Although this technique is discussed for the purpose of supporting hierarchical function calls to transactors,
                it can also be applied whenever a testbench requires hierarchical access to components of the design.
               
                The basic idea behind hierarchically accessible transactors is to create a global array of control signals,
                one for each transactor instance. As each transactor initializes itself, it registers itself with a
                hash table that maps from the transactor instance hierarchical name to the appropriate index into the
                control signal array. Additional arrays are also needed to store the parameters for each type of transactor.
                Generics can be used to pass down through the hierarchy the instance name strings to each transactor
                instance. The figures below show the flow of control for the transactor and the Apply function that
                initiates a transaction on the transactor:
               
                 
                 
                 
                Using a transaction manager queue to mix transaction streams
                For simple test sequences, you can execute a series of transactors from a single process, one after
                the other. If you want multiple transactors to execute at the same time, then you can use non-blocking
                calls to the transactors. But, if you want to have multiple sequences of transactions running in parallel,
                then you must develop a more involved transaction sequencer.
               
                One solution is to create a process for each sequence of transactions that you want to run in parallel.
                But, this is limiting in situations where you need to have control over all the types of transactions
                to run in one process. For example, in order to fully exercise an ATM switch, you need to send ATM cells
                to each input port simultaneously. Also, randomly determining the port number and ATM cell data to transmit
                can enhance the test bench. So, it would be nice to be able to generate X number of cells to send and
                transmit them to the switch through random port numbers. And while doing this, not allowing one particular
                transmitter to block another. So, a second solution is to create a transaction manager that reads transactions
                from a queue and executes them one after the other. You could have one transaction manager instance
                per port and place transactor calls randomly into their queues. In Verilog, this is difficult to do
                and beyond the scope of this paper so we are just going to cover how to implement this solution in VHDL
                and C++.
               
                In VHDL, you can implement a transaction manager by using the "hierarchical referencing" technique above
                and by creating the following: 1) an additional record type, TApplyCall that stores a Transactor Node
                and the transactor's parameters, 2) a queue of TApplyCall's, 3) functions that can be used to place
                TApplyCall's on the queue, and 4) a process that will read TApplyCall's from the queue and use them
                to execute a transactor.
               
                The transactor parameters can be represented using a "line" in VHDL so that TApplyCall can be used for
                all types of transactors. Then, you would add a data member to the Transactor Node that represents the
                type of the transactor that the transactor manager can switch on to determine what method to call to
                run the transactor. That method would be responsible for extracting the appropriate parameters from
                the parameters "line" and executing the transactor using the control signal index as described in the
                " hierarchical referencing" section.
               
                In C++, a class can be written to represent the transaction manager. This class would read transactors
                from a queue and call a virtual method, Execute, to run the transactor. So, there would be a base class
                that all transactor classes derive from and each transactor class would have it's own data member to
                represent the parameters to use for a particular transaction. Each transactor class would be responsible
                for actually performing a particular bus transaction when the Execute method is called (i.e. by using
                TestBuilder, SCV, or PLI). For each transactor that you want to place in the queue, you would create
                a new instance of the transactor class, set up its parameters data member and push it onto the queue.
               
                Using a golden reference model to verify design output in the face of randomized input
                A golden reference model is an unclocked, behavioral model of the system design that can be used to
                verify the output of a low level model (either RTL or gate level). The golden reference model must model
                both the design under test and the functionality of the surrounding BFMs. The same transactions are
                applied to both the lower level model under test and the golden reference model and the outputs of the
                two models are compared to ensure that the lower level model is functioning properly. By using a golden
                model, a verification engineer can avoid having to manually determine the expected results of his directed
                tests. Further, the use of a golden reference model is virtually required when performing constrained
                random tests as it would take too long to manually determine expected results for a large number of
                randomly generated transactions. The figure below shows a typical structure for a testbench that uses
                a golden reference model to verify the output from the design model.
               
                 
                When written in C++, golden reference models usually consist of several classes, one for each type of
                device in the system. Each class contains functions for each type of transaction that the device participates
                in. These functions take their inputs and compute the appropriate outputs in zero simulation time since
                the functions are all untimed behavioral code. The code for the golden reference model is also much
                simpler than the code for the RTL-level model as it doesn't need to account for low level protocol details
                such as when data becomes available during a transaction or handshaking requirements of a transaction.
               
                The outputs from the golden reference model can be generated before, during, or after the testing of
                the design under test. There is one advantage to running the golden reference model and the simulation
                model in parallel: the randomization of the transactions and transaction data can be modified at runtime
                according to coverage requirements of the test bench. However, this approach does require that the output
                values for both models be available at the same time during the test bench so that the values can be
                compared. This can be achieved by calling the appropriate golden reference model function at the end
                of the execution of a transactor when the results from the lower level model become available. Since
                the golden reference model is an untimed model, its outputs are available immediately after the function
                call is made and the results of the two models can be compared.
               
                Conclusion
                Transaction-based BFMs enable very robust, reusable testbenches to be created, but some problems occur
                when writing these type of testbenches due to limitations in VHDL and Verilog. In this paper, we have
                examined several coding techniques for overcoming these problems as well as some ways to overcome them
                using a combination of C++ and Verilog or VHDL. SynaptiCAD makes a graphical bus-functional model generator
                called TestBencher Pro that will generate the code described in this paper.
               
 
                Daniel Notestein, co-founder of SynaptiCAD, is the chief architect for SynaptiCAD's WaveFormer Pro and
                VeriLogger Pro products. Notestein obtained his bachelor's degree in electrical engineering and minors
                in computer science and math from Virginia Tech and his MSEE from the University of Texas.
               
                Ben Rhodes is the project leader for SynaptiCAD's TestBencher Pro product. His areas of special expertise
                include VHDL, Verilog, SystemC, OpenVera, and e test bench coding. Rhodes obtained his BSEE from Virginia
                Tech.
               
                Back to Technical Papers page |