Hyperthreading in HPC: On or Off?

This article discusses the use of simultaneous multi-threading (hyper-threading) in HPC, with a particular emphasis on financial risk systems and the COREx benchmark. It looks into the SMT status of cloud virtual machines, the ability to enable or disable and the cost implications of doing so.

Hyperthreading in HPC: On or Off?

The previous article outlined the development, in some depth, of a new benchmark designed to be a representative workload for financial risk systems. Whilst this analysed everything down to the slot ratio that should be used, one topic that was conspicuous by its absence was SMT (for an explanation of SMT, slot ratios and other HPC related terms please see our HPC glossary). For the avoidance of any doubt, the terms core, CPU and slot are used as per their definition in our glossary.

To SMT or not to SMT

Conventional wisdom and most HPC practitioners will, by default, recommend that SMT is turned off in HPC clusters. The reasoning being is that a well-designed HPC system will run as many processes as there are cores on a machine, with each process perfectly consuming all available resource from one core. Any use of SMT would result in increased contention for resource (memory, disk or even shared CPU caches), resulting in a decrease in performance.

An obvious exception to this rule (and as is often the case in many real world financial risk systems) is the use of multi-threaded processes with a (potentially) undefined number of threads being used within each process. Often coupled with non-CPU bound sections of the workload, market build is a good example of this. In such cases the use of SMT can often result in an increase in overall performance.

In our experience though, everything else in between could go either way. We have worked on systems that have resulted in performance gains by enabling SMT, and others where disabling SMT resulted in an increase. There really is only way to know — testing.

As a case in point, COREx runs as single threaded CPU bound process, with one process launched per CPU. It fits the ideal model described above near on perfectly. Most people, myself included, would expect it to perform better with SMT disabled. Multiple tests, across a range of (cloud) virtual machines however showed this to not be the case.

SMT comparison for cloud VMs with 1 or 2 cores across a variety of Intel and AMD CPUs

Enabling SMT resulted in performance gains of anywhere between 3 and 40% for COREx. That 40% figure is of particular note, and was observed using Intel’s latest Sapphire Rapids CPU. It is quite possible more recent generations of hardware have improved SMT to the point where enabling it is beneficial where it may not have been in the past. A topic for future investigation perhaps.

As such, all further tests with COREx will be run with SMT enabled where possible. 

SMT on cloud: The road less travelled

Which leads us nicely onto SMT on cloud virtual machines. By far, the majority of cloud virtual machines, by default, have SMT enabled. There are some notable exceptions to this, such as the HB family of HPC specific VMs offered by Azure or the hpc6id.32xlarge EC2 instance type from AWS. SMT is also only a feature of the AMD64 / X86_64 instruction set. I have yet to come across an ARM64 CPU that supports SMT (though if you know of such a thing please do let me know about it).

The three largest hyper-scalers (AWS, Azure and GCP) all provide the ability to disable SMT. The documentation around this however is poor to non-existent and in some cases, can require a drawn-out conversation with support before it will even work. Certainly, this doesn’t seem to be a feature they’re encouraging the use of.

Disabling SMT on a VM will not reduce its cost. This sounds obvious, however cloud VMs are often cost analysed as a price per CPU. Disabling SMT will result in a doubling of the cost per CPU as it will halve the number of CPUs available. For example, an AWS c6i.8xlarge EC2 instance types has (by default) 32 vCPUs. This is because SMT is enabled by default and each logical CPU is a thread on a SMT enabled core. If SMT is disabled, this results in the c6i.8xlarge instance type having only 16 vCPUs.

Also, the ability to disable hyper-threading is not supported by all machine types. For example, AWS metal instances do not support setting CPU options (and therefore SMT cannot be disabled).

For completeness, it is worth just pointing out that SMT should not be confused with the ability to overprovision virtualised hardware. SMT is a feature of the physical hardware to be able to run two parallel threads of execution on the same core. The ability for a hypervisor to present the same core (or CPU) to multiple guest virtual machines, effectively resulting in each guest VM having a fraction of a core is orthogonal to this.