The importance of using a fresh version of GCC.

Usually comparing performance of different versions of GCC I use the same set of options. Such comparison shows how the existing GCC optimizations were improved or how general optimization sets (-O2 or -O3) were improved.

But GCC development adds new features, new processor supports which are not switched on by the general optimization sets. Here I am comparing peak performance (or more accurately saying close to peak performance) which can be achieved on modern processors by different versions of GCC.

For this comparison I use Intel Haswell processor (3.4 GHz i5-4670) and GCC-4.2.4, GCC-4.4.4, and GCC-4.8.2.

Why do I use these versions of GCC? It is pretty obvious for GCC-4.8 as it is the latest release of GCC. GCC-4.2 is chosen because it is the last version of the compiler with GPL v2 used by Apple in OS X (although latest Mavericks OS just imitates gcc command by LLVM, it is still true for older OS X versions). GCC-4.4 is chosen as it is a system compiler of some latest commercial LINUX distributions, e.g. RHEL-6.

I used the following options:

-Ofast -flto -march=core-avx2 -mtune=core-avx2 for GCC-4.8.2 generating 64-bit code, and the same option set plus -mfpmath=sse -m32 -mpc64 for 32-bit code.
-O3 -ffast-math -mtune=core2 -march=core2 for GCC-4.4.4 for 64-bit and the same set plus -mfpmath=sse -m32 -mpc64 for 32-bit.
-O3 -ffast-math -mtune=nocona -march=nocona for GCC-4.2.4 for 64-bit code and the same set plus -mfpmath=sse -m32 for 32-bit code.

Here are some important remarks about the used options. -Ofast and LTO were not implemented in GCC-4.4 and GCC-4.2 yet. The same is about AVX2 support. The closest implemented machine architecture option can be used to tune code for Haswell are nocona and core2 for GCC-4.2 and GCC-4.4 correspondingly. As -Ofast switches on -ffast-math, the later option was added for GCC-4.4 and GCC-4.2.

Here are the SPEC2000 performance rates for the comparison (changes in percents relative to GCC-4.4 are given in parentheses):

64-bit 4.2 4.4 4.8

Int 3832 (-2.9%) 3945 (0%) 4228 (+7.2%)

FP 5001 (-9.2%) 5508 (0%) 6040 (+9.7%)

y

32-bit 4.2 4.4 4.8

Int 3468 (-4.6%) 3636 (0%) 4184 (+15.1%)

FP 4251 (-13.8%) 4933 (0%) 5420 (+9.9%)

Or in graphic form the performance changes relative to GCC-4.4.4 look:

If somebody is interesting in code size, here are average changes of size of SPEC benchmarks code (text segment) generated by GCC-4.2 and GCC-4.8 relative to size of code generated by GCC-4.4:

64-bit 4.2 4.8

Int -10.1% -5.6%

FP -28.2% -10.4%

64-bit	4.2	4.4	4.8
Int	3832 (-2.9%)	3945 (0%)	4228 (+7.2%)
FP	5001 (-9.2%)	5508 (0%)	6040 (+9.7%)

32-bit	4.2	4.4	4.8
Int	3468 (-4.6%)	3636 (0%)	4184 (+15.1%)
FP	4251 (-13.8%)	4933 (0%)	5420 (+9.9%)

64-bit	4.2	4.8
Int	-10.1%	-5.6%
FP	-28.2%	-10.4%

32-bit 4.2 4.8

Int -8.8% -3.1%

FP -26.2% +2.2%

32-bit	4.2	4.8
Int	-8.8%	-3.1%
FP	-26.2%	+2.2%

GCC is too big to follow all its development. But in my opinion such improvements since GCC-4.2 were achieved mostly by:

LTO (+IPA and inlining improvements).
developments in RA which can deal with bigger functions with higher pressure (as the result of lto and more aggressive inlining). It is seen well on 32-bit Int benchmarks when we have only 7 int regs.
new vector insns support. Unfortunately, graphite optimizations give nothing so I did not use it for 4.8.2.

Last modified: 03/18/2014 - vmakarov at redhat dot com

Return to index page.