In the first part of this series we discussed hardware of AMD64 server.
In this article we will describe some features of AMD64 platform and the
Opteron processor and do a brief review of Linux support.
10.5.2004 23:00 | Jan Houštěk | read 30627×
RELATED ARTICLES
DISCUSSION
AMD64, x86-64, x86_64?
There is a little confusion about those names. After some time of using
x86-64, AMD decided to switch to a little bit egotistic name AMD64. The old
x86-64 still remains in many places (software, documentation) but it is
expected to be replaced. Some explanation could be found in this
post from AMD to discuss@x86-64.org list.
The form with underscore is even more confusing. I suspect the real reason for
the underscore in x86_64 in Linux is that autoconf/configure hate dashes in
arch names, because of this notation: x86_64-gnu-linux-pc. If a
dash were used, the string would be unparseable without prior knowledge of all
arch names.
AMD64 architecture
The x86 did not have key changes for a long time except attaching new
instructions (MMX for example). Key features of instructions addressing,
memory segmentation, x86 instructions themselves did not change since the
i386 – that was the last revolutionary processor.
The most worth feature (and also most damned by some fundamentalists) of AMD64 is
100% backwards compatibility with x86. Nothing changes for
32bit applications, you can use 32bit applications in the 64bit OS.
AMD claims that this feature increases the number of transistors only by
2%–3% and has no impact on 64bit performance. On the other side, even
those legacy 32bit applications can benefit from fast memory and IO and flat
64bit memory space (one 32bit process cannot use more than 4GB but more
processes running simultaneously can).
Let's have a look how is the 64bit extension to x86 is achieved. Basically, it
is done by adding a new mode called long mode. Is is enabled by a
global control bit called LMA (for Long Mode Active). When LMA is disabled,
the processor operates as a standard x86 processor, and is compatible with all
existing 16 and 32bit operating systems and applications. Long mode consists
of two sub-modes, 64-bit mode and compatibility mode. Both compatibility and
legacy modes are meant to run old 16 and 32bit applications, compatibility
long mode requires 64bit OS and does not support insanities such as x86
real mode or virtual-8086 mode (if you really want is, you have to use legacy
mode and do without 64bit support).
The 64bit mode adds following new features:
- 64bit virtual addresses, flat address space with single code, data and
stack space
- 8 GPRs (General Purpose Registers) are widened to 64 bits (this is obvious
since "bitness" of processor is given by width of its GPRs)
- other 8 64bit GPRs are added (R8–R15)
- 8 128bit SSE registers are added (XMM8–XMM15)
- SSE and SSE2. Opteron and Athlon64 are AMD's first processors that
provides SSE2 instructions. Moreover, these instructions can use 16 XMM
registers (P4 has only 8). Standard x87 coprocessor is still available but
SSE should be faster.
Let's now explore what real AMD64 processors can do. Besides 64bitness and
some straightforward enhancements (like 1MB L2 cache in Opteron,
non-executable pages, IOMMU and others) there are two major
innovations – integrated memory controller and HyperTransport bus.
Memory access
In the most of x86 systems CPU access to memory consists in several steps: CPU
is connected to FSB (Front-Side Bus) operating at much lower frequency (e.g.
10 times) than the CPU core. The FSB is connected to memory controller also
known as NorthBridge which provides access to the memory. This approach has
two flaws: the FSB is too slow compared to performance needs of next
generation CPUs and it is difficult to scale (the only way is to increase its
operating frequency which is not easy because other components than CPU and
NorthBridge use is). The other flaw is that using intermediate controller
causes unnecessary delays.
AMD's solutions is really challenging. Opteron and Athlon64 have memory
controller integrated into the CPU chip. Among obvious advantages of such
approach there is one interesting consequence for multiple-way systems.
While solution with common memory controller accessible by all CPUs fit to
SMP architecture, solution with memory controller integrated in CPU is mostly
NUMA-like. Generally this is a good idea since NUMA scales much better and
gains better performance with properly designed applications. The problem is
that many applications expect SMP. Realizing this AMD provides a hybrid
approach called SUMO (for Sufficiently Uniform Memory Organization) which
enables the OS system to appear like SMP while the physical architecture is
NUMA-like (and there is still possibility to benefit from it for NUMA-aware
applications). The SMP emulation is achieved by fast inter-CPU communication
realized by the HyperTransport bus.
HyperTransport
HyperTransport is a high-speed point-to-point full-duplex link for integrated
circuits. It combines a simple layout, excellent speeds, low latencies and
good scalability; at the same time, it is compatible with the software PCI
model. Links are scalable both in frequency and data-path width. Default clock
frequency is 200MHz and the current implementations uses upto 800MHz. This is
similar to other commonly used buses. The further flexibility is enabled by
providing scalable data path width for each link (currently 2, 4, 8, 16 or 32
bits are available). 16bit HyperTransport device can be connected with 2, 4, 8
or 16 bit link. The fastest 32bit link has aggregate bandwidth of 12.8 GB/s.
Opteron processor has 3 16bit HT links and one 8bit. In 2-way systems the 8bit
link (3.2 GB/s) is used to connect slower IO devices (such as 32bit PCI),
one 16bit link connects faster IO devices (e.g. PCI-X) and the remaining
two links are aggregated into one 32bit links with throughput of 12.8 GB/s and
used for inter-CPU communication. Cheaper Athlon64 CPUs have only one 16bit
link and one 8bit &nhash; this still enables to build 2-way systems.
And of course, more than 2-way systems are also possible (only with Opterons).
The picture below shows HT links wiring in 4-way system.
Below is a diagram which demonstrates a scalability level of the Hyper
Transport bus.
Who would really benefit form AMD64
AMD64 is most likely to benefit to:
- Any application that need tons of memory, like CAD, designing systems,
large databases.
- Any system that must manage a large number of concurrent users or
application threads, such as large scale thin-client solutions, large
databases and data warehouse applications or application servers (including
web-based applications).
- Any application that can use 64bit integer calculations and SSE2
instructions such as scientific calculations, high-precision calculations,
imaging/video/signal processing (e.g. voice recognition), modelling, encryption
and compression.
- Any application that needs multiple-way system. AMD64 will scale much
better than any known x86 solution.
- Any application likely to be vulnerable to buffer-overflow and similar
way of attack, non-executable pages make exploitation more difficult.
Cryptography and safety ensuring applications get a great benefit from 64bit
integer calculations. In this sphere usage of the AMD64 can favour a real
breakthrough. E.g. for one arithmetic operation with 128-bit numbers x86 needs
60 instructions (16 mul, 29 adc, 15 add) while AMD64 needs only 12
instructions (4 mul, 5 adc, 3 add). There is a slight trick because the
execution times differ for the 32bit and 64bit operands, but anyway it is hard
to overrate importance of 64bit for cryptography. For a high-bandwidth VPN
router or another server providing large amount of encrypted data, AMD64 is an
obvious choice.
In next parts we will provide results of some benchmarks on the server
described in the first part.
Linux on AMD64
As mentioned, any x86 OS can run on AMD64 machine in legacy mode but nobody
would really want that. Let's focus on truly 64bit systems. Microsoft is
working on 64bit version of Windows (currently – May
2004 – there is a beta version available, some people predict that
we will have to wait 64 years for 64bit Windows :). SUN has Solaris 10 and
development versions FreeBSD and NetBSD also support AMD64.
Linux supports AMD64 quite well. Recent versions of kernel 2.4 has AMD64
support but 2.4 is being deprecated in favour of 2.6 version. Nearly all major
distributions have AMD64 versions. Some of them and their availability are
listed below (only stable releases are included):
- SUSE 9.0 (available in boxes or from FTP)
- SUSE 9.1 (boxes, FTP edition coming soon)
- (SUSE) SLES 8 (boxes only)
- Mandrake 9.2 and 10.0 (boxes, sources at FTP)
- (Mandrake) Corporate Server 2.1 (boxes only)
- Gentoo 2004.1 (FTP, requires compilation)
- Fedora FC1 (FTP)
- (RedHat) RHEL 3 (boxes, sources at FTP)
Slackware does not provide AMD64 version and there is no development effort
in this direction now. Debian has a highly development version and it is
possible that the next stable version will add AMD64 to the list of supported
architectures.
Resources
Related articles