The purpose of this document is to describe how to use the command line implementation of the PSATSim simulator to process a parameter space en masse. This is extremely useful when it is necessary to analyze an architectural design space rapidly, without any need to watch the simulator run.
It is first necessary to understand how PSATSim configuration files work. Each file represents a configuration space and is saved in XML format. The root node of a configuration file is <psatsim> … </psatsim>. Within this node is a set of one or more <config name="…"> … </config> nodes. These form the basis for specifying a configuration space. The name attribute of the config nodes allows you to give a name to that subset of the configuration space, it is user specified and is not used for anything other than tagging the configurations in the output file.
Within each config node there must be at least one of each of the following:
These nodes specify the parameters for the three major sections. If more than one of each type of node is present, they are multiplexed to form every possible combination. For example:
<psatsim>
<config name="Case1">
<general … />
<general … />
<execution … />
<memory … />
<memory … />
<memory … />
</config>
<config name="Case2">
<general … />
<general … />
<execution … />
<execution … />
<memory … />
<memory … />
</config>
</psatsim>
In the example above, there would be a total of 14 inter-variations. The first config block gives six inter-variations (two general blocks times one execution block times three memory blocks). The second config block gives eight inter-variations (two general blocks times two execution block times two memory blocks).
The number of combinations is generally greater than the number of inter-variations due to intra-variations of the parameters inside any of the three major sections within a config node. Though not every parameter can be varied in this manner, intra-variation simplifies the task of simulating across a significant proportion of the domain of one or more parameters. Only parameters whose domain is integral can be intra-varied, though there are further restrictions in some cases, as discussed later in this document. Inter-variation of whole blocks provides the most flexibility but requires great care to ensure that each variation of interest is correctly specified. Intra-variation simplifies this task but requires judicuous use since it grows the number of total variations dramatically.
To specify an intra-variation for a parameter, all that is necessary is to specify a range of values, such as 4-6. Intra-variation effectively multiplexes every combination of intra-variation together to form a set of blocks, which are then multiplexed to form every variation overall. For example:
<psatsim>
<config name="Case3">
<general
superscalar="1-16"
rename="16"
reorder="20"
rsb_architecture="distributed"
rs_per_rsb="1-8"
speculative="true"
speculation_accuracy="0.980"
separate_dispatch="true"
seed="1"
trace="compress.tra"
output="output.xml"
vdd="2.2"
frequency="600"
/>
<general
superscalar="1-16"
rename="16"
reorder="24"
rsb_architecture="distributed"
rs_per_rsb="1-8"
speculative="true"
speculation_accuracy="0.980"
separate_dispatch="true"
seed="1"
trace="compress.tra"
output="output.xml"
vdd="2.2"
frequency="600"
/>
<execution
architecture="standard"
integer="2"
floating="2"
branch="1"
memory="1"
/>
<memory architecture="l2">
<l1_code hitrate="0.990" latency="1" />
<l1_data hitrate="0.970" latency="1" />
<l2 hitrate="0.990" latency="3" />
<system latency="20" />
</memory>
</config>
</psatsim>
In the example above, there would be a total of two inter-variations, due to the two general blocks, and 128 intra-variations for each of the general blocks, giving 256 variations in total. The large number of variations should indicate how easy it is to go overboard with intra-variations. The desire to test all possible combinations of parameters must be tempered with an informed testing process.
The next section describes what each parameter is for and if and how much it can be intra-varied.
| Attribute Name | Domain of Values | Comments |
|---|---|---|
| superscalar | Integers 1-16 (intra-variable) | Superscalar factor |
| rename | Integers 1-512 (intra-variable) | Register renaming table size |
| reorder | Integers 1-512 (intra-variable) | Reorder buffer size |
| rsb_architecture | centralized or hybrid or distributed | Reservation station architecture |
| separate_dispatch | true or false | If false, then the decode and dispatch stages are combined, otherwise they are separated. |
| seed | Integer 0-4294967295 | Random number generator seed value. If zero, then the seed is chosen based on the starting time of the simulation. |
| trace | Filename | The filename of the input trace file |
| output | Filename | The filename of the output file |
| vdd | Floating point greater than zero | Processor supply voltage. For the default 350 nm process, should kept in the range of 1.8-3.3. |
| freq | Floating point greater than zero | Processor clock frequency, in MHz |
The attributes for the execution block are dependent on the chosen architecture. The three basic choices are standard, simple, and complex. It is also possible to specify a custom architectural configuration for the functional units and reservation stations, however, that is beyond the scope of this document.
| Attribute Name | Domain of Values | Comments |
|---|---|---|
| integer | Integers 1-8 (intra-variable) | The number of integer functional units |
| floating | Integers 1-8 (intra-variable) | The number of floating point functional units |
| branch | Integers 1-8 (intra-variable) | The number of branch functional units |
| memory | Integers 1-8 (intra-variable) | The number of memory functional units |
| Attribute Name | Domain of Values | Comments |
|---|---|---|
| alu | Integers 1-8 (intra-variable) | The number of arithmetic functional units |
| branch | Integers 1-8 (intra-variable) | The number of branch functional units |
| memory | Integers 1-8 (intra-variable) | The number of memory functional units |
| Attribute Name | Domain of Values | Comments |
|---|---|---|
| iadd | Integers 1-8 (intra-variable) | The number of integer addition functional units |
| imult | Integers 1-8 (intra-variable) | The number of integer multiplication functional units |
| idiv | Integers 1-8 (intra-variable) | The number of integer division functional units |
| fpadd | Integers 1-8 (intra-variable) | The number of floating point addition functional units |
| fpmult | Integers 1-8 (intra-variable) | The number of floating point multiplication functional units |
| fpdiv | Integers 1-8 (intra-variable) | The number of floating point division functional units |
| fpdiv | Integers 1-8 (intra-variable) | The number of floating point division functional units |
| fpsqrt | Integers 1-8 (intra-variable) | The number of floating point square-root functional units |
| branch | Integers 1-8 (intra-variable) | The number of branch functional units |
| load | Integers 1-8 (intra-variable) | The number of memory load functional units |
| store | Integers 1-8 (intra-variable) | The number of memory store functional units |
The memory blocks have up to four subnodes, depending on which memory architecture is specified. If the system architecture is chosen, then only the system sub-node is used. If the l2 architecture is chosen, then the l1_code, l1_data, l2, and system sub-nodes are used. A similar intension occurs when the l1 architecture is chosen.
Each of the sub-nodes, except for system, has two attributes: hitrate and latency. The hitrate is a floating point value in the range 0-1. The latency is an integer greater than zero. The latency specifies the number of cycles it takes to access that level of the cache, after a miss has occurred in the next higher level.
Note that specifying a latency greater than one for the L1 caches in either the l1 or l2 architectures or for the system memory in the system architecture will make high-throughput execution impossible.
Running psatsim_con without any arguments will produce a screen
such as the one shown below. The two required arguments are the configuration
file and the output file. The options allow you to increase the number of
simultaneous simulations, so as to take advantage of multiple available
processing units, and control what is printed into the output file.
psatsim_con [configuration file] [output file] {options}
Options:
-t [thread count] : Number of simultaneous threads (default=1)
-cgmua : Print Options (default=a)
c Print configuration sections
C Suppress configuration sections
g Print general results sections
G Suppress general results sections
m Print memory results sections
M Suppress memory results sections
u Print utilization results sections
U Suppress utilization results sections
a Print all results sections
A Suppress all results sections
For general optimization, the print option -cg will include all
the necessary information, such as what configuration was used and what the
general results of the simulation are. The details of output files are explained
in the next section. After executing a proper command, the program will then
output to the screen something such as the following:
$ ./psatsim_con case12.xml case12_output.xml -cg Found 14 variations Running configuration 'Case1' variation 1... Running configuration 'Case1' variation 2... Running configuration 'Case1' variation 3... Running configuration 'Case1' variation 4... Running configuration 'Case1' variation 5... Running configuration 'Case1' variation 6... Running configuration 'Case2' variation 1... Running configuration 'Case2' variation 2... Running configuration 'Case2' variation 3... Running configuration 'Case2' variation 4... Running configuration 'Case2' variation 5... Running configuration 'Case2' variation 6... Running configuration 'Case2' variation 7... Running configuration 'Case2' variation 8...
The first line of output states the number of variations which the simulator loaded from the configuration file. If this number is not correct or if it is different than what you expected, Ctrl-C will kill the program so that you can fix the configuration.
The output files are in XML format. The root node is psatsim_results, which has a variation sub-node for each variation in the configuration space. The variation node has four attributes: configuration name, variation number, started time, and run-time. The configuration name is the one specified in the config node in the configuration file that corresponds to the variation. The variation number is the index into the sub-set of the corresponding config node in the configuration file. The start time is the system time of when the simulator was started, and the run-time specifies how much system time the simulator took to execute.
The sub-nodes of the variation node depend on what configuration options you specified on the command-line. The table below lists what enabling each print option provides:
| Option | Elements | Description |
|---|---|---|
| c | psatsim | Prints the exact configuration used to initialize the simulator |
| g | general | Prints the general results of the simulation, including the number of cycles, number of instructions, number of fetches, IPC, global energy, global power, register file, result bus, and global clockbus power statistics. |
| m | memory | Prints the memory access and power statistics |
| u | fetch, decode, dispatch, rename, rsb, exec, reorder, commit, and core | Prints the precise utilization statistics for the architecture, including each stage of the pipeline and the reorder and rename buffers |
The structure of each of the output nodes is self-explanatory. Except for specialized, extremely in-depth experimentation, the most useful two sections are the config and general sections. The config section describes, in the same format as used in the configuration file, the configuration used to instantiate the simulator. This is useful as a reference so that you know what parameters went into that variation. The general section describes the overall simulation results, such as the number of cycles, the number of instructions, the IPC, and the overall average power consumption.