Skip to content

Importing GEM5

GEM5 provides a powerful platform to study the architecture and micro-architecture of computer systems. It is widely used in both academic and industrial fields.

Currently, supported ISAs include X86 and ARM. Supported timing models include AtomicSimpleCPU, TimingSimpleCPU, and O3CPU.

APIs

APIs are implemented by System Calls. The following system call numbers are assigned to these APIs.

SYSCALL_LAUNCH = 501,        // Launch request.
SYSCALL_WAITLAUNCH = 502,    // Waiit launch request.
SYSCALL_BARRIER = 503,       // Enter barrier.
SYSCALL_LOCK = 504,          // Lock mutex.
SYSCALL_UNLOCK = 505,        // Unlock mutex.
SYSCALL_REMOTE_READ = 506,   // Read cross chiplet
SYSCALL_REMOTE_WRITE = 507,  // Write cross chiplet

Each benchmark API corresponds to one system call. All arguments of the benchmark APIs are also the arguments for system calls.

Handle Syscalls

Gem5 is categorized as an execution-driven simulation. The timing model will call the function model at the right time one instruction by one instruction. Hence, the functional commands and the timing commands can be handled in the same place.

Gem5 provides unified function models for all timing models. The function models for all ISAs are implemented in folder src/arch. Each ISA has different syscall lists.

  • $SIMULATOR_ROOT/gem5/src/arch/x86/linux/syscall_tbl32.cc and $SIMULATOR_ROOT/gem5/src/arch/x86/linux/syscall_tbl64.cc define the syscall list for x86.
  • $SIMULATOR_ROOT/gem5/src/arch/arm/linux/se_workload.cc defines the syscall list for ARMv8 ISA.

Different ISAs apply the same emulator of syscalls. You can find the handler for all syscalls in $SIMULATOR_ROOT/gem5/src/sim/syscall_emul.hh and $SIMULATOR_ROOT/gem5/src/sim/syscall_emul.cc.

Handle SYSCALL_REMOTE_WRITE/SYSCALL_REMOTE_READ

The flow chart of SYSCALL_REMOTE_WRITE and SYSCALL_REMOTE_READ is as follows:

flowchart TD

subgraph Write Syscall
O1(Start)
A1[Copy data out of simulation space]
B1[Issue SEND command]
C1[Wait for RESULT command]
D1[Open Pipe]
E1[Write data to Pipe]
F1[Get current simulation cycle]
G1[Send WRITE command]
H1[Wait for SYNC command]
I1[Adjust simulation cycle]
Z1(End)
end

O1-->A1-->B1-->C1-->D1-->E1-->F1-->G1-->H1---->I1-->Z1
C1-->C1
H1-->H1

subgraph Read Syscall
O2(Start)
A2[Issue RECEIVE command]
B2[Wait for RESULT command]
C2[Open Pipe]
D2[Read data from Pipe]
E2[Get current simulation cycle]
F2[Send READ command]
G2[Wait for SYNC command]
H2[Copy data into simulation space]
I2[Adjust simulation tick]
Z2(End)
end

O2---->A2-->B2-->C2-->D2-->E2-->F2-->G2-->H2-->I2-->Z2
B2-->B2
G2-->G2

Other Syscalls

Different from SYSCALL_REMOTE_WRITE and SYSCALL_REMOTE_READ, except for functional and timing commands, it is not necessary to handle other functionality.

The flow chart is as follows:

flowchart TD

A1[Issue functional command]
B1[Wait for RESULT command]

A2[Issue timing command]
B2[Wait for SYNC command]
C2[Change the simulator tick]

A1-->B1-->A2-->B2-->C2
B1-->B1
B2-->B2

The mapping between APIs and commands is shown below:

System call Functional command Timing command
launch LAUNCH WRITE
waitlaunch WAITLAUNCH READ
barrier BARRIER WRITE
lock LOCK WRITE
unlock UNLOCK WRITE
receiveMessage READ READ
sendMessage WRITE WRITE

Adjust Simulator Tick

In order to deal with multiple clock domains in computer systems, the basic timing unit in Gem5 is called Tick instead of the cycle. Considering one chiplet with multiple clock domains, we prefer to handle the unit transaction by benchmark configuration files (.yaml).

For example,

  - cmd: "$SIMULATOR_ROOT/gem5/build/X86/gem5.opt"
    args: ["$SIMULATOR_ROOT/gem5/configs/deprecated/example/se.py", "--cmd", "$BENCHMARK_ROOT/bin/test_c", "-o", "0 0"]
    log: "gem5.0.0.log"
    is_to_stdout: false
    clock_rate: 500

The kernel simulation loop of Gem5 is one event queue. Timing models handle the first event in the queue and inject for events if necessary.

flowchart LR

A1[Event queue]
B1[Timing model]

A1--"The first event in queue"-->B1--"More events"-->A1

Two global variables are added in $SIMULATOR_ROOT/gem5/src/sim/eventq.hh and $SIMULATOR_ROOT/gem5/src/sim/eventq.cc so that all components can access them.

  • gem5::interchiplet_end_tick_valid means whether it is necessary to change the simulator tick.
  • gem5::interchiplet_end_tick means the target to change the simulator tick.

In AtomicSimpleCPU, the timing model injects one tick event into the event queue every cycle. When the simulator tick needs changing, the latency is added to the next tick event so the simulator can move forward to the target time. See $SIMULATOR_ROOT/gem5/src/cpu/simple/atomic.cc for details.

In TimingSimpleCPU, the timing model injects one fetch event into the event queue for each bubble. When the simulator tick needs changing, the next fetch event is injected into the event queue at the target time. See $SIMULATOR_ROOT/gem5/src/cpu/simple/timing.cc for details.

In O3CPU, the timing model also injects one tick event into the event queue every cycle. Hence, the next tick event is injected at the target time when the tick needs changing. See $SIMULATOR_ROOT/gem5/src/cpu/o3/cpu.cc for details.

Issue CYCLE command

Because the CPU always controls the flow of benchmarks, the CPU's execution cycle plays a vital role in the execution cycle of the entire simulation. CYCLE command is issued in file $SIMULATOR_ROOT/gem5/src/sim/sim_event.cc when the simulator quits the simulation loop.

Cross-compile for ARM ISA

As a universal simulation platform, Gem5 supports multiple ISAs. Syscalls required to handle benchmark APIs have been added to x86 and ARM ISA. Hence, Gem5 in LegoSim can also execute ARM benchmarks.

The following command installs the cross-compiler for ARM.

sudo apt install gcc-aarch64-linux-gnu

Then, you can use aarch64-linux-gnu-gcc and aarch64-linux-gnu-g++ to compile benchmarks just like gcc and g++.

Wnen apply ARM cross compile, interchiplet_c cannot be used. apis_c.cpp should be compiled to one object file and linked to the target.

For example,

# C language target
C_target: $(C_OBJS) obj/interchiplet.o
    $(CC) $(C_OBJS) obj/interchiplet.o -o $(C_TARGET)

# Interchiplet library
obj/interchiplet.o: ../../../../interchiplet/srcs/apis_c.cpp
    $(CC) $(CFLAGS) -c $< -o $@

# Rule for C object
obj/%.o: ../../snipersim/barrier/%.cpp
    $(CC) $(CFLAGS) -c $< -o $@