Counting Primes via Parallel Streams in Prolog

Logic programming has historically explored parallelism. There are implementations that support AND-parallism, OR-parallelism or both. Jekejeke Prolog provides a predicate balance/1 for simple explicit form of OR-parallelism. A generate and test problem:
 /* sequential */
 ?- Generate, Test.

will be distributed over multiple threads. The invocation is as simple as wrapping the conjunction by the predicate balance/1. The resulting query can be still used via backtracking, but the results might undergo some reordering:
 ?- use_module(library(runtime/distributed)).

 /* parallel */
 ?- balance((Generate, Test)).

As an example we will demonstrate how primes can be counted in parallel. We need first a predicate that tests a number whether it is prime. We use a very primitive non-optimized predicate, just for our demonstration:
:- use_module(library(advanced/arith)).

prime(N) :-
    M is floor(sqrt(N)),
    between(2,M,K),
    N mod K =:= 0, !, fail.
prime(_).

We can use it to sequentially enumerate primes:
?- between(1,100,X), prime(X), write(X), write(' '), fail; nl.
1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Yes

Or to enumeratre primes in parallel with the help of balance/1:
?- balance((between(1,100,X), prime(X))), write(X), write(' '), fail; nl.
1 3 7 5 2 11 13 17 19 23 29 31 37 41 47 53 59 61 67 71 73 43 79 83 89 97 
Yes

The predicate balance/1 works best if the generate does not generate too many elements, and if the test is not too short lived. This means the balance/1 predicate is made for embarassingly parallel problems with rather low number of job dispatch and rather medium long running jobs.

A problem does not always automatically pose itself in this way. When we want to count the prime numbers below 1'000'000 we might do with this sequential code, and then straight forward deploy the predicate balance/1:
:- use_module(library(advanced/aggregate)).

/* sequential */
count(N) :-
   aggregate_all(count, (between(1, 1000000, X), prime(X)), N).

/* parallel */
count2(N) :-
   aggregate_all(count, balance((between(1, 1000000, X), prime(X))), N).

The resulting performance is seen in this screenshot. We used JDK 13.0 since it has better parallel performance. And we were using a machine with 4 physical cores, the speed-up was nevertheless only below factor 2:



A better performance can be archived if we let each job compute a bouquet of 1000 numbers. We can then use sum/1 instead count/0 to aggregate again. The corresponding Prolog code reads as follows:
/* sequential */
count3(N) :-
   aggregate_all(sum(M), (between(1, 1000, Y), slice(Y, M)), N).

/* parallel */
count4(N) :-
   aggregate_all(sum(M), balance((between(1, 1000, Y), slice(Y, M))), N).

slice(Y, M) :-
   aggregate_all(count, (H is Y*1000, L is H-999, between(L, H, X), prime(X)), M).

The results are now more encouraging. The speed-up now exceeds the factor 2 and is even above factor 3. Since we used a i7-6700HQ we guess more cannot be done, since sequential profits from turbo clocking, whichin in parallel:



Open Source: Module "distributed"
htps : // 기주 b. 이 m / j 부 r 세 / 지 케 지 케 - ゔ ぇ l / b ぉ b / 뭐 r / ㅇ k 루 / 렌세 / 룬치메 / st 리부테 d. p

User Manual: Module "distributed"
h tp // w w. 지케케. ch/이타 b/도 cぇt/p로d/엔/도 cs/05_룬/10_도쿠/02_레후에렌세/07_테에오리에 s/02_룬치메/04_ぢst 리부테 d. HTML

좋은 웹페이지 즐겨찾기