Comparison of GCC and LLVM in 2014.

This year the comparison is made on the latest GCC 4.9 and LLVM 3.4 (which was released at the very end of 2013). To see a progress in compiler performance, I also added the data for GCC 4.8 and LLVM 3.3 on x86-64.

As usually I am focused mostly on the compiler comparison as optimizing compilers. I don't consider other aspects of the compilers as stability, quality of debug information (especially in optimizations modes), supported languages, standards and extensions (e.g. OMP), supported targets and ABI, support of just-in-time compilation etc.

I did not benchmark SPECFP2000 this year as LLVM still does not support FORTRAN. I don't want to spend my time on installing and benchmarking LLVM dragonegg plugin which permits to connect GCC (Fortran) front-end and LLVM as I think there is a small base of the plugin users.

I did not benchmarked SPEC2000 for x86 this year as I beleive 32-bit Intel paltforms are less and less interesting. One major reason for this is that performance of the most programs on x86-64 is higher than one on x86.

Instead I am adding results for the second major paltform ARM for the first time. I'd prefer to bechmark AARCH64 but such machines are still not available for the public.

This year I am trying to focus on performance for the latest generation of Intel CPUs (I am using Haswell 3.4Ghz i5-4670 under Fedora Core 20 for this). Therefore I've chosen tuning and architecture options specifically fit for the CPU. I tried to make chances for the compilers are equal. If I had to do some option set modification for a compiler to compile a benchmark, I used analogous set for its competitor.

I did the comparison on x86-64 using following major options equivalent with my point of view:

-Ofast -march=core-avx2 -mtune=corei7 for GCC-4.8, GCC-4.9, and LLVM3.4
-O3 -march=core-avx2 -mtune=corei7 -ffast-math for LLVM3.3 as LLVM-3.3 does not support -Ofast
I had to use -O0 instead of -O3 or -Ofast for compilation of SPECInt2000 254.gap for the both compilers as LLVM-3.3/3.4 can not generate a correct code in any optimization mode for this test.
I used -fno-fast-math for 32-bit perlbmk and bzip2 for LLVM-3.3 and GCC-4.8 compilers as LLVM-3.3 can not compile the code without this option.
I also used -march=corei7 for 32-bit bzip2 for LLVM-3.3 and GCC-4.8 compilers as LLVM-3.3 can not generate a correct code with -march=core-avx2.

For ARM I tried to use a fresh CPU too. I used Exynos 5410 (1.6GHz Cortex-A15) with 2GB memory under Fedora Core19. Some versions of Samsung Galaxy S4 cell phone is based on this CPU.

Here are some my conclusions from analyzing the data:

GCC shows a steady progress on x86-64 on the performance front from one generation to another generation. LLVM performance practically did not change.
LLVM-3.4 improves its compilation speed when GCC-4.9 needs more time for better code generation when LTO is off. On the other hand, LTO was significantly sped up in GCC-4.9. And that is an important achievement.
The difference between the same generations of LLVM and GCC on integer benchmarks on x86-64 is now only about 6% and 2% correspondingly without and with LTO (if you need the exact numbers, please see the tables). This gap is narrower than in my 2013 comparison where it was 8% and 3.5%. I think the major reason for this is in Intel CPUs progress. In 2013 I used 2 generations older processor (Sandy Bridge). Intel CPUs become better in execution of less optimized code, in other words they become less sensitive to some optimizations.
For ARM GCC generates about 10% better integer code. I believe that GCC has a bigger score difference on most other targets than on x86/x86-64 CPUs. At least I saw a similar thing on PPC too.
I think that GCC community should pay more attention to improving code quality for x86-64 as LLVM performance is realy close to GCC one.
To improve GCC performance, we need some analysis where other compilers (LLVM or Intel compiler) generates a better code. Unfortunately, it is a full time job for more than one person familiar with the compiler internals. But if somebody is interesting, I'd propose to analyze 186.crafty or 255.vortex LTO code where LLVM is doing much better than GCC.
I only spent few time on analyzing the generated code to find some code generation differences. I got an impression that LLVM has a better aliasing on the other hand GCC does a better job with dead store elimination. Another difference is that LLVM systematicaly uses SSE regs for memory-memory structure movement. GCC uses general registers for this. It is hard for me to say what is better for modern Intel CPUs without more investigation but the LLVM code is usually smaller as SSE regs are more wide. I've checked Intel compiler, it also uses general regs in this situation.

Last modified: 06/23/2014 - vmakarov at redhat dot com

Return to index page.