Original post is here: eklausmeier.goip.de
1. Problem statement #
Compiling source code with a compiler usually employs the make
command which keeps track of dependencies. Additionally GNU make
can parallelize your build using the j
-parameter. Often you also want a so called clean build, i.e., compile all source code files, just in case make
missed some files when recompiling. Instead of deleting all previous effort one can use a cache of previous compilations.
I had two questions where I wanted quantitative answers:
- What is the best
j
for parallel make, i.e., how many parallel make's should one fire? - What effect does a compiler cache have?
[more_WP_Tag]To the first question: At first I thought that the more I parallelize the better. This belief was based on Compiler Speed-up, which basically says, the more RAM you have, you should parallelize more. Although, this article, Parallelizing Compilations, showed that more parallelism is not always better. So I conducted my own experiments to verify the results.
To the second question: As compiler cache I used Andrew Tridgell's ccache, which he wrote for Samba.
For these tests I used the source code of the SLURM scheduler, see slurm.schedmd.com. This software package contains roughly 1.000 C source code and header files (~600 C plus ~300 header files), comprising ca. 550 kLOC. My machine uses an AMD CPU FX 8120 (Bulldozer), 8 cores, clocked with 3.1GHz, and 16 GB RAM.
I went through the dull task of compiling the SLURM software with different settings of make
, then cleaning up everything, and repeat the cycle. Below chart shows the results for varying j
, once without compiler, and once with a compiler cache. Execution time is in seconds, time is "real" time as given by time
command.
runtime for parallel make
2. Conclusions and key findings #
- Running more parallel make jobs than processor cores on the machine does not gain you performance. It is not bad, but it is not good either.
make -j
without explicit number of parallel tasks is a good choice.- The C compiler cache ccache speeds up your compilations up to a factor of 5, sometimes even higher. There is no good reason not to use a compiler cache.
3. Raw numbers #
Making all of SLURM:
1tar jxf slurm-14.11.4.tar.bz2
2cd slurm-14.11.4
3./configure
4time make
5real 4m36.470s
6user 3m24.248s
7sys 1m12.379s
Between all compilations the result is cleaned:
1time make clean
2real 0m5.558s
3user 0m2.014s
4sys 0m3.912s
Now compiling and cleaning, going down from infinity, 16, 15, down to 1.
1time make -j > /dev/null
2real 1m44.970s
3user 4m17.657s
4sys 0m46.102s
5
6time make -j16
7real 1m44.144s
8user 4m16.120s
9sys 0m46.191s
10
11time make -j16 > /dev/null
12real 1m44.745s
13user 4m16.242s
14sys 0m46.358s
15
16time make -j15 > /dev/null
17real 1m44.231s
18user 4m16.457s
19sys 0m46.269s
20
21time make -j14 > /dev/null
22real 1m44.476s
23user 4m15.833s
24sys 0m47.091s
25
26time make -j13 > /dev/null
27real 1m44.675s
28user 4m17.787s
29sys 0m45.906s
30
31time make -j12 > /dev/null
32real 1m44.046s
33user 4m16.554s
34sys 0m46.575s
35
36time make -j11 > /dev/null
37real 1m43.612s
38user 4m16.319s
39sys 0m45.957s
40
41time make -j10 > /dev/null
42real 1m44.111s
43user 4m16.999s
44sys 0m46.181s
45
46time make -j9 > /dev/null
47real 1m43.239s
48user 4m16.244s
49sys 0m46.073s
50
51time make -j8 > /dev/null
52real 1m43.310s
53user 4m15.317s
54sys 0m46.257s
55
56time make -j7 > /dev/null
57real 1m44.913s
58user 4m9.122s
59sys 0m46.388s
60
61time make -j6 > /dev/null
62real 1m47.387s
63user 4m1.811s
64sys 0m46.165s
65
66time make -j5 > /dev/null
67real 1m51.977s
68user 3m52.737s
69sys 0m44.644s
70
71time make -j4 > /dev/null
72real 1m55.399s
73user 3m37.683s
74sys 0m44.401s
75
76time make -j3 > /dev/null
77real 2m6.940s
78user 3m31.548s
79sys 0m45.247s
80
81time make -j2 > /dev/null
82real 2m29.562s
83user 3m15.105s
84sys 0m45.061s
85
86time make -j1 > /dev/null
87real 3m55.786s
88user 3m12.081s
89sys 0m45.784s
Now the same procedure with ccache.
1time make -j > /dev/null
2real 0m38.625s
3user 0m37.360s
4sys 0m26.392s
5
6time make -j8 > /dev/null
7real 0m38.592s
8user 0m36.810s
9sys 0m26.214s
10
11time make -j7 > /dev/null
12real 0m39.086s
13user 0m36.790s
14sys 0m26.490s
15
16time make -j6 > /dev/null
17real 0m39.107s
18user 0m36.447s
19sys 0m26.119s
20
21time make -j5 > /dev/null
22real 0m40.034s
23user 0m36.930s
24sys 0m26.208s
25
26time make -j4 > /dev/null
27real 0m41.072s
28user 0m36.400s
29sys 0m26.573s
30
31time make -j3 > /dev/null
32real 0m42.400s
33user 0m36.205s
34sys 0m26.972s
35
36time make -j2 > /dev/null
37real 0m47.814s
38user 0m37.186s
39sys 0m27.551s
40
41time make -j1 > /dev/null
42real 1m4.060s
43user 0m37.844s
44sys 0m28.901s
Speed comparison for simple C file:
1time cc -c j0.c
2real 0m0.043s
3user 0m0.034s
4sys 0m0.009s
5
6time /usr/lib/ccache/cc -c j0.c
7real 0m0.008s
8user 0m0.005s
9sys 0m0.004s
Code of simple C file j0.c
:
1#include <stdio.h>
2#include <stdlib.h>
3#include <math.h>
4
5int main (int argc, char *argv[]) {
6 double x;
7 double end = ((argc >= 2) ? atof(argv[1]) : 20.0);
8
9 for (x=1; x<=end; ++x)
10 printf("%3.0f\t%16.12f\n",x,j0(x));
11
12 return 0;
13}
Counting lines of code in SLURM:
1$ wc `find . \( -name \*.h -o -name \*.c \)`
4. man page excerpt for make #
-j [jobs], --jobs[=jobs] Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option, the last one is effective. If the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously.