Speeding-Up Software Builds: Parallelizing Make and Compiler Cache

· klm's blog


Original post is here: eklausmeier.goip.de

1. Problem statement #

Compiling source code with a compiler usually employs the make command which keeps track of dependencies. Additionally GNU make can parallelize your build using the j-parameter. Often you also want a so called clean build, i.e., compile all source code files, just in case make missed some files when recompiling. Instead of deleting all previous effort one can use a cache of previous compilations.

I had two questions where I wanted quantitative answers:

  1. What is the best j for parallel make, i.e., how many parallel make's should one fire?
  2. What effect does a compiler cache have?

[more_WP_Tag]To the first question: At first I thought that the more I parallelize the better. This belief was based on Compiler Speed-up, which basically says, the more RAM you have, you should parallelize more. Although, this article, Parallelizing Compilations, showed that more parallelism is not always better. So I conducted my own experiments to verify the results.

To the second question: As compiler cache I used Andrew Tridgell's ccache, which he wrote for Samba.

For these tests I used the source code of the SLURM scheduler, see slurm.schedmd.com. This software package contains roughly 1.000 C source code and header files (~600 C plus ~300 header files), comprising ca. 550 kLOC. My machine uses an AMD CPU FX 8120 (Bulldozer), 8 cores, clocked with 3.1GHz, and 16 GB RAM.

I went through the dull task of compiling the SLURM software with different settings of make, then cleaning up everything, and repeat the cycle. Below chart shows the results for varying j, once without compiler, and once with a compiler cache. Execution time is in seconds, time is "real" time as given by time command.

runtime for parallel make runtime for parallel make

2. Conclusions and key findings #

  1. Running more parallel make jobs than processor cores on the machine does not gain you performance. It is not bad, but it is not good either.
  2. make -j without explicit number of parallel tasks is a good choice.
  3. The C compiler cache ccache speeds up your compilations up to a factor of 5, sometimes even higher. There is no good reason not to use a compiler cache.

3. Raw numbers #

Making all of SLURM:

1tar jxf slurm-14.11.4.tar.bz2
2cd slurm-14.11.4
3./configure
4time make
5real    4m36.470s
6user    3m24.248s
7sys     1m12.379s

Between all compilations the result is cleaned:

1time make clean
2real    0m5.558s
3user    0m2.014s
4sys     0m3.912s

Now compiling and cleaning, going down from infinity, 16, 15, down to 1.

 1time make -j > /dev/null
 2real    1m44.970s
 3user    4m17.657s
 4sys     0m46.102s
 5
 6time make -j16
 7real    1m44.144s
 8user    4m16.120s
 9sys     0m46.191s
10
11time make -j16 > /dev/null
12real    1m44.745s
13user    4m16.242s
14sys     0m46.358s
15
16time make -j15 > /dev/null
17real    1m44.231s
18user    4m16.457s
19sys     0m46.269s
20
21time make -j14 > /dev/null
22real    1m44.476s
23user    4m15.833s
24sys     0m47.091s
25
26time make -j13 > /dev/null
27real    1m44.675s
28user    4m17.787s
29sys     0m45.906s
30
31time make -j12 > /dev/null
32real    1m44.046s
33user    4m16.554s
34sys     0m46.575s
35
36time make -j11 > /dev/null
37real    1m43.612s
38user    4m16.319s
39sys     0m45.957s
40
41time make -j10 > /dev/null
42real    1m44.111s
43user    4m16.999s
44sys     0m46.181s
45
46time make -j9 > /dev/null
47real    1m43.239s
48user    4m16.244s
49sys     0m46.073s
50
51time make -j8 > /dev/null
52real    1m43.310s
53user    4m15.317s
54sys     0m46.257s
55
56time make -j7 > /dev/null
57real    1m44.913s
58user    4m9.122s
59sys     0m46.388s
60
61time make -j6 > /dev/null
62real    1m47.387s
63user    4m1.811s
64sys     0m46.165s
65
66time make -j5 > /dev/null
67real    1m51.977s
68user    3m52.737s
69sys     0m44.644s
70
71time make -j4 > /dev/null
72real    1m55.399s
73user    3m37.683s
74sys     0m44.401s
75
76time make -j3 > /dev/null
77real    2m6.940s
78user    3m31.548s
79sys     0m45.247s
80
81time make -j2 > /dev/null
82real    2m29.562s
83user    3m15.105s
84sys     0m45.061s
85
86time make -j1 > /dev/null
87real    3m55.786s
88user    3m12.081s
89sys     0m45.784s

Now the same procedure with ccache.

 1time make -j > /dev/null
 2real    0m38.625s
 3user    0m37.360s
 4sys     0m26.392s
 5
 6time make -j8 > /dev/null
 7real    0m38.592s
 8user    0m36.810s
 9sys     0m26.214s
10
11time make -j7 > /dev/null
12real    0m39.086s
13user    0m36.790s
14sys     0m26.490s
15
16time make -j6 > /dev/null
17real    0m39.107s
18user    0m36.447s
19sys     0m26.119s
20
21time make -j5 > /dev/null
22real    0m40.034s
23user    0m36.930s
24sys     0m26.208s
25
26time make -j4 > /dev/null
27real    0m41.072s
28user    0m36.400s
29sys     0m26.573s
30
31time make -j3 > /dev/null
32real    0m42.400s
33user    0m36.205s
34sys     0m26.972s
35
36time make -j2 > /dev/null
37real    0m47.814s
38user    0m37.186s
39sys     0m27.551s
40
41time make -j1 > /dev/null
42real    1m4.060s
43user    0m37.844s
44sys     0m28.901s

Speed comparison for simple C file:

1time cc -c j0.c
2real    0m0.043s
3user    0m0.034s
4sys     0m0.009s
5
6time /usr/lib/ccache/cc -c j0.c
7real    0m0.008s
8user    0m0.005s
9sys     0m0.004s

Code of simple C file j0.c:

 1#include <stdio.h>
 2#include <stdlib.h>
 3#include <math.h>
 4
 5int main (int argc, char *argv[]) {
 6        double x;
 7        double end = ((argc >= 2) ? atof(argv[1]) : 20.0);
 8
 9        for (x=1; x<=end; ++x)
10                printf("%3.0f\t%16.12f\n",x,j0(x));
11
12        return 0;
13}

Counting lines of code in SLURM:

1$ wc `find . \( -name \*.h -o -name \*.c \)`

4. man page excerpt for make #

-j [jobs], --jobs[=jobs] Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option, the last one is effective. If the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously.