Performance Comparison C vs. Java vs. Javascript vs. LuaJIT vs. PyPy vs. PHP vs. Python vs. Perl

· klm's blog


Original post is here: eklausmeier.goip.de

1. Introduction. I always wanted to benchmark PHP, to confirm myself that choosing PHP as a static site generator is not a dead-end, compared, for example, against node.js. PHP 7 has already made huge performance advancements. See PHP 7: Features and Performance Comparison:

  • Drupal - Compared to previous versions, Drupal 8 runs 72 percent faster on PHP 7 when compared to previous versions.
  • Wordpress - For WordPress, a PHP 7 runtime only executes 25M CPU instructions compared to just under 100M doing the same job on older PHP versions.
  • Magento - For Magento, servers running PHP 7 are able to serve up to three times more requests as those running PHP 5.6.

Also, there were continuous improvements in PHP 7 and PHP 8, see PHP 8.1 Performance Is Continuing To Improve With Early Benchmarks.

2. Korol's and Zahariev's results. A specific comparison between PHP and node.js can be found in Viktor Korol's PHP vs NodeJS comparison and benchmarks, clearly favoring PHP. A performance comparison between PHP and various other languages was conducted in C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3). It shows that node.js is multiple times faster than PHP 7.

For the reader's convenience I reproduce the table from Ivan Zahariev, but leave out Rust, C#, Go, and Ruby:

Language real in s
C++ 0.951
PyPy 1.496
node.js 1.837
PHP 7 6.624
Java 8 12.144
Python 3.5 18.077
Perl 25.068
Python 2.7 25.333

I had compared C vs. Lua vs LuaJIT and Java here, and I had previously benchmarked C vs. LuaJIT vs. various Lua versions vs. Pallene here.

3. Results. This time I again use the $n$-queens problem, i.e., how many times can $n$ queens be put on an $n\times n$ chess board without attacking any other queen. The programs in C, Python, PHP, Javascript (node.js), Lua, and Perl are given in 5. Programs. Task at hand is to compute from $n=1$ to $n=12$ all possible combinations. For example, for C I call time xdamcnt2 12. I ran the programs multiple times and took the lowest result. Results are "real" time in seconds, i.e., this includes "user" and "sys" times. Of course, all programs produced exactly the same results -- all computations are integer computations. Expected results are:

n 2 4 6 8 10 12 14
combinations 1 0 0 2 10 4 40 92 352 724 2,680 14,200 73,712 365,596

Runtimes on NUC (Intel), Ryzen (AMD) and Odroid (ARM) are given in below table:

Language NUC Ryzen Odroid
C 0.17 0.15 0.44
Java 1.8.0 0.31 0.22 108.17
node.js 16.4 0.34 0.21 1.67
LuaJIT 0.49 0.33 2.06
PyPy3 7.3.5 1.57 0.86 n/a
PHP 8 3.23 2.35 42.38
Python 3.9.6 12.29 7.65 168.17
Perl 5.34 25.87 21.14 209.47

Conclusions:

  1. C is by far the fastest, by a factor of 2 on Intel to its nearest competitors: Java and Javascript. On Ryzen the difference is not so nuanced.
  2. Javascript and Java are almost similar in speed -- I didn't expect this result.
  3. Java's performance on ARM is terrible, even worse than Javascript, LuaJIT, and PHP combined. This means that very likely Java will run poorly on Apple's M1, or Amazon Graviton.
  4. Michael Pall's LuaJIT is a strong competitor to all other languages.
  5. PyPy is more than 7-times faster than Python.
  6. PHP 8 is faster than Python, but almost 10-times slower than Javascript, confirming the "adding numbers" benchmark as mentioned in the introduction from Viktor Korol, and in line with measurments from Ivan Zahariev.
  7. While C is roughly 2.5-times slower on ARM than on Intel, the other languages suffer a disproportionate degradation.

4. Machines used. Performance tests were run on Intel NUC Kit D54250WYKH, a quadcore i5-4250U machine.

 1$ lscpu
 2Architecture:            x86_64
 3  CPU op-mode(s):        32-bit, 64-bit
 4  Address sizes:         39 bits physical, 48 bits virtual
 5  Byte Order:            Little Endian
 6CPU(s):                  4
 7  On-line CPU(s) list:   0-3
 8Vendor ID:               GenuineIntel
 9  Model name:            Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
10    CPU family:          6
11    Model:               69
12    Thread(s) per core:  2
13    Core(s) per socket:  2
14    Socket(s):           1
15    Stepping:            1
16    CPU max MHz:         2600.0000
17    CPU min MHz:         800.0000
18    BogoMIPS:            3792.18
19    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
20                         m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid ap
21                         erfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic m
22                         ovbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs
23                         ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsave
24                         opt dtherm ida arat pln pts md_clear flush_l1d
25Virtualization features:
26  Virtualization:        VT-x
27Caches (sum of all):
28  L1d:                   64 KiB (2 instances)
29  L1i:                   64 KiB (2 instances)
30  L2:                    512 KiB (2 instances)
31  L3:                    3 MiB (1 instance)
32NUMA:
33  NUMA node(s):          1
34  NUMA node0 CPU(s):     0-3
35Vulnerabilities:
36  Itlb multihit:         KVM: Mitigation: VMX disabled
37  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
38  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
39  Meltdown:              Mitigation; PTI
40  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
41  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
42  Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
43  Srbds:                 Vulnerable: No microcode
44  Tsx async abort:       Not affected

Operating system and version:

1$ uname -a
2Linux nuc 5.12.15-arch1-1 #1 SMP PREEMPT Wed, 07 Jul 2021 23:35:29 +0000 x86_64 GNU/Linux

Performance tests were run on Ryzen 3400G.

 1$ lscpu
 2Architecture:            x86_64
 3  CPU op-mode(s):        32-bit, 64-bit
 4  Address sizes:         43 bits physical, 48 bits virtual
 5  Byte Order:            Little Endian
 6CPU(s):                  8
 7  On-line CPU(s) list:   0-7
 8Vendor ID:               AuthenticAMD
 9  Model name:            AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics
10    CPU family:          23
11    Model:               24
12    Thread(s) per core:  2
13    Core(s) per socket:  4
14    Socket(s):           1
15    Stepping:            1
16    Frequency boost:     enabled
17    CPU max MHz:         3700.0000
18    CPU min MHz:         1400.0000
19    BogoMIPS:            7402.17
20    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb r
21                         dtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe pop
22                         cnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext p
23                         erfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt s
24                         ha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeass
25                         ists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sme sev sev_es
26Virtualization features:
27  Virtualization:        AMD-V
28Caches (sum of all):
29  L1d:                   128 KiB (4 instances)
30  L1i:                   256 KiB (4 instances)
31  L2:                    2 MiB (4 instances)
32  L3:                    4 MiB (1 instance)
33NUMA:
34  NUMA node(s):          1
35  NUMA node0 CPU(s):     0-7
36Vulnerabilities:
37  Itlb multihit:         Not affected
38  L1tf:                  Not affected
39  Mds:                   Not affected
40  Meltdown:              Not affected
41  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
42  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
43  Spectre v2:            Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
44  Srbds:                 Not affected
45  Tsx async abort:       Not affected

Operating system and version:

1$ uname -a
2Linux ryzen 5.12.15-arch1-1 #1 SMP PREEMPT Wed, 07 Jul 2021 23:35:29 +0000 x86_64 GNU/Linux

Tests were run on an Odroid XU4, an octacore ARM machine.

 1$ lscpu
 2Architecture:           armv7l
 3  Byte Order:           Little Endian
 4CPU(s):                 8
 5  On-line CPU(s) list:  0-7
 6Vendor ID:              ARM
 7  Model name:           Cortex-A7
 8    Model:              3
 9    Thread(s) per core: 1
10    Core(s) per socket: 4
11    Socket(s):          1
12    Stepping:           r0p3
13    CPU max MHz:        1500.0000
14    CPU min MHz:        200.0000
15    BogoMIPS:           120.00
16    Flags:              half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
17  Model name:           Cortex-A15
18    Model:              3
19    Thread(s) per core: 1
20    Core(s) per socket: 4
21    Socket(s):          1
22    Stepping:           r2p3
23    CPU max MHz:        2000.0000
24    CPU min MHz:        200.0000
25    BogoMIPS:           120.00
26    Flags:              half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae

Operating system and version:

1$ uname -a
2Linux odroid 4.14.180-3-ARCH #1 SMP PREEMPT Sat Jun 5 22:29:47 UTC 2021 armv7l GNU/Linux

5. Programs. The C program. C compiler is gcc 11.1.0 and using flags -Wall -O3 -march=native.

 1#define abs(x)   ((x >= 0) ? x : -x)
 2
 3/* Check if k-th queen is attacked by any other prior queen.
 4   Return nonzero if configuration is OK, zero otherwise.
 5*/
 6int configOkay (int k, int a[]) {
 7	int z = a[k];
 8
 9	for (int j=1; j<k; ++j) {
10		int l = z - a[j];
11		if (l == 0  ||  abs(l) == k - j) return 0;
12	}
13	return 1;
14}
15
16
17
18long solve (int N, int a[]) {  // return number of positions
19	long cnt = 0;
20	int k = a[1] = 1;
21	int N2 = N;  //(N + 1) / 2;
22
23	loop:
24		if (configOkay(k,a)) {
25			if (k < N)  { a[++k] = 1;  goto loop; }
26			else ++cnt;
27		}
28		do
29			if (a[k] < N)  { a[k] += 1;  goto loop; }
30		while (--k > 1);
31		a[1] += 1;
32		if (a[1] > N2) return cnt;
33		k = 2 ,  a[2] = 1;
34	goto loop;
35}

The Java program. Java is 1.8.0_292-b10.

 1class xdamcnt2 {
 2
 3	/* Check if k-th queen is attacked by any other prior queen.
 4	   Return nonzero if configuration is OK, zero otherwise.
 5	*/
 6	public static boolean configOkay (int k, int[] a) {
 7		int z = a[k];
 8
 9		for (int j=1; j<k; ++j) {
10			int l = z - a[j];
11			if (l == 0  ||  Math.abs(l) == k - j) return false;
12		}
13		return true;
14	}
15
16
17
18	public static long solve (int N, int[] a) {  // return number of positions
19		long cnt = 0;
20		int k = a[1] = 1;
21		int N2 = N;  //(N + 1) / 2;
22		boolean flag;
23
24		for (;;) {
25			flag = false;
26			if (configOkay(k,a)) {
27				if (k < N)  { a[++k] = 1;  continue; }
28				else ++cnt;
29			}
30			do
31				if (a[k] < N)  { a[k] += 1;  flag = true; break; }
32			while (--k > 1);
33			if (flag) continue;
34			a[1] += 1;
35			if (a[1] > N2) return cnt;
36			k = 2;  a[2] = 1;
37		}
38	}
39	...

The Javascript program. node.js is v16.4.2

 1/* Check if k-th queen is attacked by any other prior queen.
 2   Return nonzero if configuration is OK, zero otherwise.
 3*/
 4function configOkay (k, a) {
 5	var j;
 6	let z = a[k];
 7
 8	for (j=1; j<k; ++j) {
 9		let l = z - a[j];
10		if (l == 0  ||  Math.abs(l) == k - j) return 0;
11	}
12	return 1;
13}
14
15
16
17function solve (N, a) {  // return number of positions
18	let cnt = 0;
19	let k = a[1] = 1;
20	let N2 = N;  //(N + 1) / 2;
21
22	loop: for(;;) {
23		if (configOkay(k,a)) {
24			if (k < N)  { a[++k] = 1;  continue loop; }
25			else ++cnt;
26		}
27		do
28			if (a[k] < N)  { a[k] += 1;  continue loop; }
29		while (--k > 1);
30		a[1] += 1;
31		if (a[1] > N2) return cnt;
32		k = 2 ,  a[2] = 1;
33	}
34}

The Python program. Python is 3.9.6. PyPy is 7.3.5 for Python 3.7.0.

 1#  Check if k-th queen is attacked by any other prior queen.
 2#  Return 'True' if configuration is OK, 'False' otherwise.
 3#
 4def configOkay (k, a):
 5	z = a[k]
 6
 7	for j in range(1,k):
 8		l = z - a[j]
 9		if (l == 0  or  abs(l) == k - j): return False
10	return True
11
12
13
14
15def solve (N, a):	# return number of positions
16	cnt = 0
17	k = a[1] = 1
18	N2 = N	# (N + 1) / 2
19
20	while True:
21		flag = 0
22		if configOkay(k,a):
23			if k < N:  k += 1; a[k] = 1; continue
24			else: cnt += 1
25
26		while True:
27			if a[k] < N:  a[k] += 1; flag = 1; break
28			k -= 1
29			if k <= 1: break
30		if flag == 1: continue
31		a[1] += 1
32		if (a[1] > N2): return cnt
33		k = 2
34		a[2] = 1

The PHP program. PHP is 8.0.8.

 1<?php
 2/* Check if k-th queen is attacked by any other prior queen.
 3   Return nonzero if configuration is OK, zero otherwise.
 4*/
 5function configOkay (int $k, &$a) {
 6	$z = $a[$k];
 7
 8	for ($j=1; $j<$k; ++$j) {
 9		$l = $z - $a[$j];
10		if ($l == 0  ||  abs($l) == $k - $j) return 0;
11	}
12	return 1;
13}
14
15
16
17function solve (int $N, &$a) {  // return number of positions
18	$cnt = 0;
19	$k = $a[1] = 1;
20	$N2 = $N;  //(N + 1) / 2;
21
22	loop:
23		if (configOkay($k,$a)) {
24			if ($k < $N)  { $a[++$k] = 1;  goto loop; }
25			else ++$cnt;
26		}
27		do
28			if ($a[$k] < $N)  { $a[$k] += 1;  goto loop; }
29		while (--$k > 1);
30		$a[1] += 1;
31		if ($a[1] > $N2) return $cnt;
32		$k = 2 ;  $a[2] = 1;
33	goto loop;
34}

The Perl program. Perl is 5.34.0

 1use strict;
 2
 3
 4#  Check if k-th queen is attacked by any other prior queen.
 5#  Return nonzero if configuration is OK, zero otherwise.
 6#
 7sub configOkay {
 8	my ($k, $a) = ($_[0], $_[1]);
 9	my $z = $a->[$k];
10	my ($j,$l);
11
12	for ($j=1; $j<$k; ++$j) {
13		$l = $z - $a->[$j];
14		if ($l == 0  ||  abs($l) == $k - $j) { return 0; }
15	}
16	return 1;
17}
18
19
20
21sub solve {	# return number of positions
22	my ($N, $a) = ($_[0], $_[1]);
23	my $cnt = 0;
24	my $k = $a->[1] = 1;
25	my $N2 = $N;  # (N + 1) / 2;
26
27	loop:
28		if (configOkay($k,$a)) {
29			if ($k < $N)  { $a->[++$k] = 1;  goto loop; }
30			else { ++$cnt; }
31		}
32		do {
33			if ($a->[$k] < $N)  { $a->[$k] += 1;  goto loop; }
34		} while (--$k > 1);
35		$a->[1] += 1;
36		if ($a->[1] > $N2) { return $cnt; }
37		$k = 2 ;  $a->[2] = 1;
38	goto loop;
39}

Added 27-Nov-2022: A strong contender to replace LuaJIT might be Haoran Xu's LuaJIT Remake (LJR).