This year the comparison is made on the latest GCC 4.9
and LLVM 3.4 (which was released at the very end of 2013). To
see a progress in compiler performance, I also added the data for GCC
4.8 and LLVM 3.3 on x86-64.
As usually I am focused mostly on the compiler comparison
as optimizing compilers. I don't consider other aspects of the
compilers as stability, quality of debug information (especially in
optimizations modes), supported languages, standards and extensions
(e.g. OMP), supported targets and ABI, support of just-in-time
I did not benchmark SPECFP2000 this year as LLVM still does not
support FORTRAN. I don't want to spend my time on installing and
benchmarking LLVM dragonegg plugin which permits to connect GCC
(Fortran) front-end and LLVM as I think there is a small base of the
I did not benchmarked SPEC2000 for x86 this year as I beleive 32-bit
Intel paltforms are less and less interesting. One major reason for
this is that performance of the most programs on x86-64 is higher than
one on x86.
Instead I am adding results for the second major paltform ARM for the
first time. I'd prefer to bechmark AARCH64 but such machines are
still not available for the public.
This year I am trying to focus on performance for the latest
generation of Intel CPUs (I am using Haswell 3.4Ghz i5-4670
under Fedora Core 20 for this). Therefore I've chosen tuning and
architecture options specifically fit for the CPU. I tried to make
chances for the compilers are equal. If I had to do some option set
modification for a compiler to compile a benchmark, I used analogous
set for its competitor.
I did the comparison on x86-64 using following major options
equivalent with my point of view:
- -Ofast -march=core-avx2 -mtune=corei7 for GCC-4.8, GCC-4.9, and
- -O3 -march=core-avx2 -mtune=corei7 -ffast-math for
LLVM3.3 as LLVM-3.3 does not support -Ofast
- I had to use -O0 instead of -O3 or -Ofast
for compilation of SPECInt2000 254.gap for the both compilers as
LLVM-3.3/3.4 can not generate a correct code in any optimization mode for
- I used -fno-fast-math for 32-bit perlbmk and bzip2
for LLVM-3.3 and GCC-4.8 compilers as LLVM-3.3 can not compile the
code without this option.
- I also used -march=corei7 for 32-bit bzip2
for LLVM-3.3 and GCC-4.8 compilers as LLVM-3.3 can not generate a
correct code with -march=core-avx2.
For ARM I tried to use a fresh CPU too. I used Exynos 5410 (1.6GHz
Cortex-A15) with 2GB memory under Fedora Core19. Some versions of
Samsung Galaxy S4 cell phone is based on this CPU.
Here are some my conclusions from analyzing the data:
- GCC shows a steady progress on x86-64 on the performance
front from one generation to another generation. LLVM performance
practically did not change.
- LLVM-3.4 improves its compilation speed when
GCC-4.9 needs more time for better code generation when LTO
is off. On the other hand, LTO was significantly sped up in
GCC-4.9. And that is an important achievement.
- The difference between the same generations of LLVM and GCC on
integer benchmarks on x86-64 is now only about
6% and 2% correspondingly without and with LTO (if you
need the exact numbers, please see the tables). This
gap is narrower than in my 2013 comparison where it was 8%
and 3.5%. I think the major reason for this is in Intel CPUs
progress. In 2013 I used 2 generations older processor (Sandy
Bridge). Intel CPUs become better in execution of less optimized
code, in other words they become less sensitive to some optimizations.
- For ARM GCC generates about 10% better integer code.
I believe that GCC has a bigger score difference on most other targets
than on x86/x86-64 CPUs. At least I saw a similar thing on PPC too.
I think that GCC community should pay more attention to improving
code quality for x86-64 as LLVM performance is realy close
to GCC one.
- To improve GCC performance, we need some analysis where other
compilers (LLVM or Intel compiler) generates a better code.
Unfortunately, it is a full time job for more than one person
familiar with the compiler internals. But if somebody is interesting,
I'd propose to analyze 186.crafty or 255.vortex LTO code where LLVM
is doing much better than GCC.
- I only spent few time on analyzing the generated code to find some
code generation differences. I got an impression that LLVM has a
better aliasing on the other hand GCC does a better job with dead
store elimination. Another difference is that LLVM systematicaly
uses SSE regs for memory-memory structure movement. GCC uses
general registers for this. It is hard for me to say what is better
for modern Intel CPUs without more investigation but the LLVM code
is usually smaller as SSE regs are more wide. I've checked Intel
compiler, it also uses general regs in this situation.
Last modified: 06/23/2014 - vmakarov at redhat dot com
Return to index page.