Parallel QEMU

While QEMU has continued to be optimized for KVM to make use of the growing number of cores on modern systems TCG emulation has been stuck running in a single thread. This year there is another push to get a workable solution merged upstream.


I’ve been researching on QEMU for more than 2 years. Single threaded emulation is always a pain for me. The performance of QEMU could be better on current multi-threaded programs running on multi-core ARM platforms. Multi-core smartphone is very common. However, as a emulation tool for Android, QEMU does not run multi-threads on multi-core hosts.

Few years ago, around 2012, PQEMU[1] use multi-threads to solve this issue while COREMU[2] aiming this issue by multi-process way. They have competitive performance to each other. I decided to use PQEMU for my own performance tool since NTHU is more close to me but they do not maintain the code after they publish the paper. I notice that it’s not a good idea to use it anymore.


Fortunately, in the middle of 2015, KVM held an event and invited two speakers contributing their efforts on two major problems on parallelizing QEMU, TCG (code generation) and Memory (including atomic operations).

TCG: Tiny code generator is the core of QEMU which enables cross platform emulation. The concept is similar to LLVM. TCG translate target codes(which could be ARM, MIPS, PowerPC, x86, etc) to IR(intermediate representation). Then, the back end of TCG translates the IR into the host binary. The host could be any platform as well. This is one of the reasons that makes QEMU so famous.

Memory: The problem of memory, i.e. SoftMMU, includes atomic operations and TLB entries. To make it running correctly on multi-thread emulation, Alex proposed two new TCG operations, load link and store condition. Also, to complete the atomicity, he mentioned about the tlb_lock() functions.

In the code, there are several global variables need to be tackle in order to parallelize QEMU. Some of them had been modified to thread-safe in QEMU 2.x versions. This is the main reason of why discussing this issue again now. It is a very good time comparing to it was in 2012. The code was not so pretty and thread-safe that time. In the past years, QEMU has been changed a lot. The code architecture is clean and pretty. Also they did many changes on TCG’s functions to make it more safe and clean. I believe, this time, it would be merged back to mainline in the future QEMU 3.x version.

For more details and information please refer to the YouTube video:


Alex Bennée
Senior Software Engineer, Linaro
Alex is a senior software engineer working in Linaro’s Virtualization team. | An experienced FLOSS developer with over 20 years of experience in embedded | and systems programming he currently spends most of his time on QEMU’s TCG | based emulation. The first piece of assembly he wrote was for the 6809 in his | Dragon 32 followed by excessive pixel flinging on the 68000 before x86 took | over the world.
Frederic Konrad
Fred is a software engineer which is mostly interested in hardware
simulation projects like QEMU and SystemC. He is one of the contributor to MTTCG as it’s a really interesting way to speed up the simulation.

Slides (Alex):…

Slides (Fred):…


If you are interested in the newest official information, please refer to:


[1] PQEMU:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s