Chinese x86 processors

Frank Riemenschneider,

Farewell to Intel?

China has taken a major step towards independence from US manufacturers Intel and AMD with the launch of the latest x86 processors from Zhaoxin. The new chips were unveiled at the 19th China International Industry Fair in Shanghai.

© Fotolia / julien tromeur

Shanghai Zhaoxin Semiconductor' is a Chinese microprocessor developer that has long been working on the development of a domestic x86 CPU microarchitecture in order to make China independent of US suppliers Intel and AMD. Whether the Chinese government's fear of being spied on via backdoors designed into American CPUs or the inability to spy on its own is the primary driving force can only be speculated.

In order to understand how Zhaoxin obtained the x86-relevant patents and technologies in the first place, one must first look at the ownership structure. In addition to Intel and AMD, there was a comparatively tiny third US manufacturer that had access to x86 technology via cross-licensing agreements: VIA Technologies Inc, founded in Silicon Valley in 1987. The company has been based in Taiwan since 1992. Zhaoxin is a joint venture between Shanghai Alliance Investment Ltd (80.1%) - a company de facto owned by the state via the Shanghai SASAC (State-owned Assets Supervision and Administration Commission of the State Council) - and VIA Technologies (19.9%), with the result that Zhaoxin can develop x86-compatible CPUs without the risk of patent infringement.

A 2010 settlement with the FTC (Federal Trade Commission) following proceedings against Intel for exploiting its monopoly position to the detriment of competition also states that Intel must continue to grant licenses for the development and sale of x86 processors by AMD and VIA Technologies - even if they have the x86 processors produced by contract manufacturers such as Globalfoundries or TSMC.

Incidentally, VIA is not the only company outside the US with an x86 license - as part of AMD's Q1 2016 financial results, the US company announced a new joint venture to develop x86 SoCs for servers, in which AMD has joined forces with Tianjin Haiguang Advanced Technology Investment Co, Ltd (Thatic), an investment arm of the Chinese Academy of Sciences. AMD is contributing x86 and SoC IP along with significant engineering and other technical resources (and receiving $293 million in license fees), while Thatic is also providing technical resources and funding behind the venture.

Advertisement

Figure 1: Comparison of system architectures today (left) and in the future (right).

© Image: Zhaoxin / Image1 edited by Alfes-Bodinger

Fifth CPU generation 'KaiXian'

Zhaoxin's new generation of processors based on a microarchitecture called 'WuDaoKou' is manufactured in a 28nm process at the Chinese foundry HLMC (Shanghai Huali Microelectronics Corporation) and is the successor to the 'Zhangjiang' architecture, which is more or less a 1:1 clone of VIA's 'Isaiah-II' architecture developed by VIA's subsidiary Centaur Technology.

WuDaoKou is the first design to resemble today's x86 microprocessors by getting rid of the front-side bus (FSB). Previously, the chipset integrated the southbridge and northbridge(see Figure 1). A new uncore now contains the memory controller as well as all I/O PHYs and the memory and cache arbitration(see Figure 2).

Image 2: The WuDaoKou Uncore.

© Zhaoxin

The new chip is a complete SoC with N-core clusters, an integrated graphics processor and the uncore (see box) on a single chip. Each cluster (Zhaoxin also calls it a module) consists of four cores, each with an 8x associative 32 KB L1 cache for data and instructions and a shared 4 MB 32x associative L2 cache. The clusters are merged in the uncore and can communicate directly with each other via a new coherent switching matrix. While the design can scale to a higher core count, current chips only have two clusters for a total of eight cores.

The CPU design itself is a superscalar out-of-order design with speculative instruction execution that implements the x86-64 instruction set and includes a five-stage reduced pipeline compared to the current design. The jump prediction has been optimized and the execution units in the backend have been "rebalanced" - unfortunately, Zhaoxin has not commented on further details of the microarchitecture. Overall, the new CPUs are said to be around 25% faster for single-thread performance and 40% faster for multi-core workloads.

The switching matrix is a point-to-point high-speed interconnect that offers a much higher bandwidth than the previous solution (front-side bus) could deliver. In addition, it also reduces latency and implements functions for control flow management and cache coherency. As this chip also contains a GPU, it is also connected via the switching matrix. The new memory controller in the Uncore has been improved. It now supports up to dual-channel DDR4 with data rates of up to 2400 MT/s (although current SKUs only appear to support up to 2133 MT/s).

New CPU families

Figure 3: The KX-U5580M in a 37.5 mm x 37.5 mm HFCBGA housing.

© Zhaoxin

Zhaoxin announced two new product families based on its latest architecture: KaiXian 5000 (KX-5000) and Kais-Heng 20000 (KH-20000). The KaiXian 5000 series is mainly designed for PCs, workstations and laptops. These SKUs are positioned against Intel's Core i3 and Core i5 processors(see Figure 3).

The model numbering corresponds to that of AMD and Intel. The first digit "5" refers to the 5th generation, the next three digits to the clock frequency, the number of cores and the market segment. In addition, the U prefix refers to high-end 8-core models and the M suffix to low-power models. All models feature virtualization support compatible with Intel's VT-x, Trusted Execution Technology (TXT), SSE-4.2 and AVX support. These models support 64GB of DDR4 memory and have an integrated GPU that supports up to three displays with DirectX 11.1 support and 4K resolution.

It is worth noting that Zhaoxin has made some minor improvements to PadLock (a security engine found on many VIA chips), such as support for the two Chinese cryptographic hash algorithms SM3 and SM4. But beyond that, the architecture is identical.

We asked Zhaoxin if they are affected by the recent vulnerabilities and got confirmation that the KX-5000 series is not affected by Meltdown. Spectre is theoretically applicable, but supposedly requires a much more complex sequence of operations than Intel CPUs, which would make an attack incredibly difficult.

In fact, Zhaoxin is trying to use Meltdown to push their own domestically designed chips as a more secure alternative. Of course, without knowing details of the microarchitecture, we can't validate either statement. Of course, the higher level of integration comes at a price. The new KX-5000 chips implement 2.1 billion transistors in their Quad-CPU version, which is about seven times as many as the approximately 300 million transistors of the ZX-C. The die size is 187 mm², which will have a negative impact on costs and chip yield.

DerivativeNumber of coresL2 cacheClock frequencyMax. external memory
KX-554044 MB1.8 GHz64 GB
KX-564044 MB2.0 GHz64 GB
KX-U558088 MB1.8 GHz64 GB
KX-U5580M88 MB1.8 GHz64 GB
KX-U568088 MB2.0 GHz64 GB
KH-2580088 MB1.8 GHz128 GB
KH-2680088 MB2.0 GHz128 GB

Table 1: Overview of the new x86 CPUs designed and made in China.

In addition to the KX-5000 family, Zhaoxin announced the Kaisheng-20000 family, which is aimed at embedded networks, storage and servers. This series should not be confused with a similarly named "ZX-2000" series, which are actually quad-core ARM Cortex-A17 type CPUs.As with the KX-5000 chips, all models feature virtualization support compatible with Intel's VT-x, Trusted Execution Technology (TXT), SSE 4.2 and AVX support. The Kaisheng 20000 chips support up to 128 GB of memory and have additional support for ECC and RDIMMs. In addition, no GPU is enabled for these SKUs.

Table 1 lists all announced derivatives of the KX and KH families.

Computing power

TestKX-5640 (4 CPUs at 2.0 GHz)KX-U5680 (8 CPUs at 2.0 GHz)Atom C2750 (8 CPUs at 2.4/2.6 GHz)
SPECInt19,119,917,5
SPECint_rate64,3115101
SPECfp22,925,723,0
SPECfp_rate5381,376,8

Table 2: Benchmark comparison between KX-5000 and Intel's C2750.

Figure 4: The architecture of the Atom C2750 with its Silvermont CPUs.

© Intel

Zhaoxin provides the SPEC CPU 2006 results shown in Table 2. A comparison was made with Intel's Atom C2750 microserver chip(see Figure 4), which is still based on the old Atom Silvermont microarchitecture, is manufactured in a 22 nm process and, unlike WuDao-Kou, does not support multithreading. As it is unclear what optimizations Zhaoxin has made in his specifications, the base values were used for the Atom. Both in single-thread (SPEC...) and multi-core mode (SPEC..._rate), the 8-core variant KX-U5680 beats the Intel Atom in integer (int) and floating-point arithmetic (fp). However, Intel's Silvermont CPU has now been replaced by Goldmont, which has around 50 % higher integer computing power than Silvermont [1] and should therefore easily win a comparison against the KX-5000.

Goal: Beat AMD

Zhaoxin is already working on the next generation of KX-6000 processors. These processors are based on the Lujiazui microarchitecture, which is intended for TSMC's 16 nm process, but could possibly be manufactured in 14 nm at SMIC if SMIC has its 14 nm process ready by then.

In order to increase the computing power, one of the main focuses is to increase the clock frequency. Lujiazui is expected to reach at least 3 GHz. In addition, the memory controller supports higher data rates (up to 3200 MT/s).

Zhaoxin has announced that it will achieve "AMD performance" with the KX-6000 successor KX-7000. In concrete terms, this means that the KX-7000 will achieve the computing power of Zen 2. This would require a switch to TSMC's 10 nm or 7 nm process, as SMIC or another Chinese mainland foundry will certainly not be able to offer such high-end production by then. DDR5 and PCIe 4 as well as an even higher clock frequency will be supported. Zhaoxin explained that they plan to significantly improve the pipeline in order to improve the IPC "considerably", without going into details. It is expected that the single-thread computing power will be increased by a factor of around 1.5 compared to the KX-5000.

All in all, Zhaoxin is still playing catch-up at the moment, but with WuDaoKou they have already made a big leap forward. They will have to take similar steps with future architectures to close the gap. Whether the Chinese will be able to match AMD or even Intel in terms of computing power remains to be seen. Nevertheless, Zhaoxin (and more importantly the Chinese government) is hell-bent on pushing these two US companies out of China.

References:

Riemenschneider, F.: Intel's 'Goldmont' makes 'Atom' competitive. DESIGN&ELEKTRONIK 2017, H. 5, p. 50 ff.

What is an Uncore?

"Uncore" is a term used by Intel to describe the functions of a microprocessor that are not in the core, but must be closely connected to the core in order to achieve high computing performance. Since the release of the Sandy Bridge Intel microarchitecture, it has been referred to as a "system agent". The core contains the components of the processor that are involved in the execution of instructions, including the ALU, FPU, L1 and L2 cache. Uncore functions include QPI controller, L3 cache, snoop agent pipeline, on-chip memory controller and Thunderbolt controller. Other bus controllers such as SPI and LPC are part of the chipset.

The Intel Uncore design comes from its origin as a Northbridge. The Intel Uncore design reorganizes the functions critical to the core, placing them physically closer to the core on the chip, reducing their access latency.

In particular, the Intel Uncore microarchitecture is divided into a number of modular units. The main Uncore interface to the core is the so-called cache box (CBox), which interfaces with the last level cache (LLC) and is responsible for managing cache coherency. Several internal and external QPI connections are managed by physical layer units called PBox. Connections between the PBox, CBox and one or more iMCs (MBox) are managed by the system configuration controller (UBox) and a router (RBox).

  • Xing Icon
  • LinkedIn Icon
Advertisement
Advertisement

You might also be interested in

Advertisement
Advertisement
Advertisement
Advertisement

Ethernet

The Ultra Ethernet Consortium

The Ultra Ethernet Consortium (UEC), which has now been launched, aims to establish a new Ethernet-based communication stack architecture to meet the growing demands on networks for AI & High Performance Computing.

read more...
Advertisement
Advertisement
Advertisement

Overview

The top articles in August

The question of what role artificial intelligence already plays in predictive maintenance was of great interest in the most-read articles in August. And last but not least, the question: What are Google and Tesla planning in terms of robots?

read more...
Subscribe to our newsletter
Advertisement
Back to home