(1993)
Antbook
—————-
by R. Zuczek
1. Introduction
2. The cube
3. Point and ant variables
4. Input/output
5. Assigning data to points
6. Embedded spaces
7. Subroutines and other constructs
8. Asynchronous rules
9. Implementation
10. Creation theory
11. Epilogue
1. Introduction
——————
This is an attempt to revive an old and little known „ant method” of programming highly parallel algorithms. I invented this method 20 years ago when I worked
on my dissertation on parallel computations at Warsaw University. Later I made some improvements. You can find references to my old papers at the end of this text.
The ant programming shows its power in describing of parallel algorithms where very large number (say 65,536) of processes are active simultaneously. The description of such computations would be impossible if the problem being solved have not exhibited some symmetry of data and operations. The space called the n-cube, hypercube, or simply cube, as well as other spaces determine this symmetry.
The ant method makes use of power of traditional procedural languages, particularly of the C-language. As a companion file to this tutorial you can find the kernel program kern.c
enabling you run ant computations on your IBM PC and Microsoft DOS. You can also find programs and subroutines as examples of fundamental parallel algoritms. I tested the kernel with the Borland C-compiler v.2.0 configured as described in the subpage named Readme. I believe the kernel would work with other C-compilers and operating systems after some modifications.
Of course, the described implementation for PC will not speed up computations, i.e. you certainly do not obtain a supercomputer. Maybe you will get results in longer time due to the extra operations inside the kernel. The kernel has also other inconveniences and limitations, however it is intended primarily as a tool for experimenting with parallel algorithms. On this ocassion you will see what should be done for the massively parallel machines to be competitive in the world of traditional computers.
The informal picture of an ant computation can be given as follows. Ants, the ficticious beings of point size, do motions in space called a cube. They move from one point
to another, expand, exit, and perform computations on data associated with points and data that are carried by them. You can think the programs by which the behaviour
of ants is governed are carried by ants. Due to the symmetry of data and algorithms, all the ants carry the same ant program.
The important feature of the ant method is that ant programs describe synchronous computations. Every ant program is interpreted by means of synchronous
computation. Every ant program must be designed to produce a correctly defined synchronous computation. The work of ants in synchronous interpretation
resembles mechanical clockwork. There is no indeterminism in the motion of ants, all moves are strictly programmed. On the other hand, when an ant program
is performed on a computer and some asynchronous rule is applied, the computational process may look less coordinated. Nevertheless, it produces the same, correct results.
2. The n-cube
——————-
Generally, a space is a set of points together with some structure imposed on it. The n-cube is a set of N points, where N is n-th power of 2. For instance 3-cube has 8 points, 4-cube has 16 points, etc. Let us mark the p points of the n-cube by integers 0,1,…,N-1. A vector r can be added to a given point p to express a move to neighbouring point. Vectors are represented by numbers 0,1,…,N-1, and addition is the C-language operation ^. In other words, if points and vectors are writen in binary form, we add bits modulo 2. For example 3= 0011(bin), 10= 1010(bin) and 3^10= 1001(bin)= 9.
The most simple, elementary vectors in the n-cube are so called generators, which are integers {1,2,4,8,…}, i.e. powers of 2. When expressed in binary, generators have form „all zeroes and a single one”. The generators are called generators because they generate all the elements of the n-cube when added to the starting point 0 in all possible ways.
The n-cube can be drawn as a graph where vertices represent points and edges represent generators. For example, the 3-cube can be drawn as follows:
. 6 ———— 7 generators:
. /| /| 1
4 ————- 5 | | ———-
| | | | | 4
| 2 ———- | 3 | /
| / |/ / 2
0 ————- 1 /
The cube can be considered as a subspace of all-containing, „infinite dimensional” oo-cube. Thus, a cube can be defined by a subset of the set of all possible generators {1,2,4…. }. We call such a subset a base of a given cube. Note, that a base can be
expressed as a single integer. For instance, the number
56 == 111000 (bin)
determines the base {32,16,8}. We say that a cube is spread over generators from a given base. A cube doesn’t necessarily contain zero. For example, consider a cube containing zero and spread over 32,16,8. Let us add a shift vector d to the 8 points. The resulting set {p ^ d, for p= 0,1,…,7} also is called a cube.
We distinguish three operations by which ants can move in the n-cube: transfer, expansion and quit. Transfer, denoted by t(r), moves an ant from a given point p along the vector r to the point v+r. Expansion, denoted tx(r), splits an ant in a given point p so that two ants will be created after the operation: an old one that will stay in p and a new one that will appear in p+r. The operation of quit, denoted o(), performs exit of an ant from space.
The function
w(r) = number of binary 1’s in r
is called the weight of r. This is the measure that says how many elementary transfers t(g), g from {1,2,4,…}, is contained in t(r). It is commonly supposed that transfers will be the most time consuming operations in any future implementation. Thus, it is of upmost importance to develop programs which are optimalized with respect to the
sum of w(r) for all r used in computation. The functions contained in bope.c mostly support writing programs using t(g), tx(g), where g are generators.
Now we explain how moving operations are used in ant programs. The base programming construct is a sequence of operations. Assume a single ant placed in point 0 of the 3-cube and consider the following sequence:
tx(1); /* executed by 1 ant */
tx(2); /* executed by 2 ants */
tx(4); /* executed by 4 ants */
There is a special rule of intepreting ant programs that says that interpretation is by synchronous computation. As comments suggest, tx(1) is executed by initial ant. Two ants that arise after this operation execute next command tx(2) synchronously, creating so 4 ants. Finaly these 4 ants execute synchronously tx(4) and 8 ants are obtained in the 8 points of the 3-cube as a result.
Let us look what happens when the 8 ants will continue with the following sequence:
t(1); /* transfer along gen 1 */
t(2); /* transfer along gen 2 */
t(4); /* transfer along gen 4 */
By executing t(1), the four ants being placed in 0,1,2,3 will transfer to points 4,5,6,7; at the same time, synchronously and instantly, the four ants in 4,5,6,7 will transfer to 0,1,2,3. This ressembles mirroring relative to vertical plane (see the picture of the 3-cube).
You can observe a similar effect during execution of the next two transfers. Finally, the ant that was originaly in 0 will be placed in 7, the ant originaly in 1 will be in 6, etc.
The two main priciples of ant programming are:
– program is to be intepreted synchronously with respect to operations t(), tx(), and o()
– program is to be written so that no collisions between ants can occur during synchronous program execution
Below presented is complete ant program consistig of the operations discussed. The main program assumes a single ant initially placed in point zero.
#include <kern.h>
void main()
{
tx(1); tx(2); tx(4); /* expansion */
t(1); t(2); t(4); /* permutation */
o(); /* global quit */
}
The program is finished by the operation o() which causes exit of all eight ants from space. This is a legal ending of a program for our kernel.c implementation (where so called global asynchronous rule is used), but generally you must insert a sequence into a program
which retracts all the ants into a single ant in point zero before final quit.
3. Point and ant variables
—————————–
In order ants to do some useful work, variables need to be introduced into programs. It is natural to distinguish two kinds of variables: those representing data in points and those
representing data in ants. We apply a simple convention in which point data are those defined as globals, whereas ant data are those defined as locals in programs. In order to better distinguish the two kinds of variables, we recommend that point data be written by small letters, whereas ant data be written by capital letters.
Examples:
a= A; /* ant makes copy of ant data A to point variable a */
A= a+A; /* ant adds point and ant data and stores the result to A */
…
Every expression describes a local operation performed by one ant in the point where this ant is currently placed.
The program that follows is a modification of the previous one. The expansion sequence is rewritten to a for-cycle. The integer variable G that controls the cycle represents a generator (the values of G are subsequently 1,2,4). The variable G necessarily had to be defined as ant variable. If we were using point variable, say g, for this purpose, each new ant produced by the operation tx(g) would have found the value of g undefined in the „expanded to” point v^g.
#include <kern.h>
int a,g; /* point variables */
void main()
{
int A,G; /* ant varables */
for (G= 1; G <= 4; tx(G), G<<= 1); /* expansion */
a= 1; /* a= 1 in all points */
g= 4;
while (g) { /* for g=4,2,1 */
A= a; /* pick up the value a */
t(g); /* transfer alongside g */
a= a+ A; /* add to the point value */
g= g >> 1; /* divide g by 2 */
}
o(); /* a== 8 in all points */
}
Let us see in a more detail how a program (as that above) is to be interpreted. Let us fix an arbitrary state of the interpretative computation. There are some arithmetic computations at the beginning of that state, which are performed in all points where ants are placed. (This is true at the very beginning where there is a single initial ant). These arithmetic computations can be run completely asynchronously up to the moment, when a moving statement t(), tx(), or o() is encountered in program by some ant in some point. Every ant encountering such statement must temporarily stop its activity on this statement and wait until all ants encounter a similar moving operation in the program. Most likely it will
be the same moving statement in the same place in program.
When all the ants eventually reach the place before the moving statement, it comes the moment, when they all perform the motion synchronously and instantly. After this, everything can run asynchronously again, until new moving statement is encountered, etc.
4. Input/output
——————-
Using input/output operations as printf(), scanf(), fgets(), etc. is straigthforward – simply an ant in a point performs printing, reading, writing. Of course, normally we can’t assign
printers and screens with every point of space. Usually we will also be restricted in using too many files. Because of these restrictions, normally only the initial ant opens files, windows, etc. and subseqent ants use them in some „unpredictable” order.
The following sample program is almost self-explanatory. An initial ant expands to the 2-cube, where 4 ants open their own files and write something to them. The program illustrates the using of a special system variable
currentpoint
provided by the kernel. It is a point variable and its value is p in point p. It should be treated as read-only in your programs.
#include <stdio.h>
#include <stdlib.h>
#include <kern.h>
char *fname= ” „;
FILE *f;
void main()
{
tx(1); tx(2); /* expansion to 4 points */
itoa(currentpoint,fname,2); /* name of file derived from point*/
f= fopen(fname,”w”); /* open file */
fputs(„hello world\n”,f); /* write something to it */
fputs(„from point „,f);
fputs(fname,f);
o();
}
5. Assigning data to points
——————————–
Parallel algorithms work with arrays of data. Every element of an array must be placed in some point of the cube. The assignment of data to points is a delicate problem because it affects complexity and speed of programs. Some „theory” to this problem can be found in the section about program design. But sometimes theory may prove to be unpractical or needlessly complicated. Particularly, if no data transfer is needed, as in the following example, the theory „fails”, because any assignment is good.
We present an example of program to display something on the screen by means of multiple ants. The initial ant creates a window consisting of 64 = 8 x 8 squares. Every square on the screen has its counterpart in a character variable a that is placed in a point of the 6-cube. We have chosen the following correspondence between the (i,j)-th element of the matrix of squares and points in space:
i= currentpoint & 7; /* 7 == 000111 (bin) */
j= (currentpoint & 56) >> 3; /* 56 == 111000 (bin) */
The program is as follows:
#include <kern.h>
#include <bope.h>
#include <conio.h>
int i,j; /* coordinates */
char a; /* character to display */
void main()
{
clrscr();
window(40,4,55,11);
clrscr(); /* initial ant opens window */
xpand(7); /* expansion along 1-st row */
xpand(56); /* expansion along 1-st column */
a= currentpoint+48; /* assign character */
i= currentpoint & 7; /* row index i */
j= (currentpoint & 56) >> 3; /* column index */
gotoxy(i*2+1,j+1); /* place into window */
putch(c); /* display it */
o(); /*quit */
}
The expansion is performed by means of the function xpand() defined in the kernel extension bope.c. The argument of this function is integer representing a subbase
i.e. a subset of generators (not a vector). The efect of the operation xpand(subbase) is the same as that of the sequence of operations tx(g) for g taken from the subbase.
Suppose now we want to move array elements in space. Consider the following one-dimensional array of numbers
a0,a1,…,a7
assigned to the points of the 3-cube, i-th element assigned to i. If we want to shift this array cyclicaly one place to the right (i-th element to the point (i+1) modulo 8), we can do it by applying the function traa(int subbase,int k) defined in bope.c. The parameter subbase defines a subcube where the data are stored. The second parameter defines the shift value (not vector, simply a number). The function can be used as follows:
/* 8 ants in 3-cube */
A= a; /* pick up element of array */
traa(7,1); /* transfer arithmetically by 1, modulo 8 */
a= A; /* replace old a */
The function traa(subbase,k) is implemented by means of t(r), where r is a vector, which depends on the point where the ant is currently placed. If the subbase had the form 00011..111(bin), the vector r would be expressed
r= currentpoint ^ ((currentpoint + k) & subbase))
On the other hand, if the array elements a0,a1,…a7 were assigned to points this way:
a0 a1 a2 a3 a4 a5 a6 a7
0 1 3 2 6 7 5 4
the operation gtrans(7) could be used instead of traa(7,1). The function gtrans(int subbase) is implemented by means of t(g), where g is a generator. Thus, the cost of gtrans() is always 1. In fact, t(g) == gtrans(g) for a generator g, i.e. a subbase consisting of a single generators. The algoritm used in gtrans() follows from the following picture:
.. 4 ————-5
.. | /
.. 7 –|———– 6
.. | | —> cube
.. | 3 ————-2
.. | /
.. 0 ————– 1
The presented mapping is known as Gray code. The Gray code can be defined in form of initialized array, or in form of function or macro:
gray[8] = {0,1,3,2,6,7,5,4};
#define gray(i) (i ^ (i >> 1))
The main property of Gray coding is that the neighboring elements of the sequence in gray[] differ in one bit position. The function gray(), its inverse antigray() and the transfer function gtrans(), are defined in bope.c.
6. Embedded spaces
——————-
The previous discussion leads to the abstraction of general spaces and embedding them into a cube. On the two pictures below two ways of embedding the most simple spaces – cycles are presented.
0 —>– 1 —>– 2 —>– 3
| |
| | traa(7,1)
| |
7 –<— 6 –<— 5 –<— 4
0 —>– 1 —>– 3 —>– 2
| |
| | gtrans(7)
| |
4 –<— 5 –<— 7 –<— 6
The numbers represent points of the 3-cube. In both cases the operation xpand(7) can be used to expand a single ant into the cycle. In order to move in the cycles we use either traa() or gtrans() functions, respectively.
By the way, the 1-cube is a cycle itself. We draw
the corresponding graph simply
0 —— 1
and we can use any of the operations t(1), traa(1,1) and gtrans(1)
to move in this space.
Direct product is the operation producing larger,
more dimensional spaces from given smaller spaces.
The following graph represents the direct product
of two 4-cycles embedded into the 4-cube:
| | | |
— 00 —- 01 —- 02 —- 03 –>
| | | |
— 10 —- 11 —- 12 —- 13 –> traa(3,1)
| | | | traa(12,1)
— 20 —- 21 —- 22 —- 23 –>
| | | |
— 30 —- 31 —- 32 —- 33 –>
| | | |
v v v v
The symbols „ij” represent points (4*i + j) of the 4-cube. The four „row” and four „column” cycles can be used to transfer in this space. The correspondig operations are traa(3,1) and traa(12,1), respectively. Two consecutive expansions xpand(3) and xpand(12) along two „perpendicular” cycles perform expansion to all the 16 points. The same can be done by applying xpand(15).
The picture below shows the Gray embedding of the 4×4 space:
| | | |
— 00 —- 01 —- 03 —- 02 –>
| | | |
— 10 —- 11 —- 13 —- 12 –> gtrans(3)
| | | | gtrans(12)
— 30 —- 31 —- 33 —- 32 –>
| | | |
— 20 —- 21 —- 23 —- 22 –>
| | | |
v v v v
7. Subroutines and other constructs
—————————————
This section was meant to contain notes about using „special” C-language constructs. In this respect, this section is not complete. In principle, any C constructs or system calls can be used. If any „side effect” will encounter it should be taken as „a rule of programming”. For example, if you transfer a pointer
int *a= „xxxx”
to another point by means of
A= a; t(g); a= A;
you obtain a pointer to the local string in the new point, not to string in the previous point.
Special attention should be taken when using moving operations. For example, the „if” statement with t() should have always contain „else” part with t() or o():
if (something) t(r);
else t(0); /* or o(), … */
Otherwise the program would work differently/incorrectly. The general rule is to write a „single stream” programs with respect to transfer operations. In other words, a program should be written so that at any time of the interpretative computation it executes transfer operation(s) located „approximately” in the same place of a program.
It is known, that subroutines represent the main abstraction method in the process of programming. We can write (parallel) algorithms in form of collection of functions, place them in libraries, and use them in complex programs as parts. When writing functions we must bear in mind that parameters as well as local variables of functions are pushed onto the stack. Because these stack variables are treated as ant variables, they are all transferred across space by the operations t() and tx(), if t() and/or tx() are actually used inside a function. If we do not want or need some of local variables to be transferred, we define them as static.
The following subroutine for matrix transposition is typical in this respect. The matrix is supposed to be defined in main program by
float a;
The constants
#define ROWBASE 7
#define COLBASE 56
determine where the matrix is placed in the cube.
The function is invoked by
transpose(&a,ROWBASE,COLBASE);
It does’t make difference whether the arithmetic or Gray embedding of the matrix into the cube is used. The idea of transposition is to transfer the value of a from the point where currentpoint has form „ij” to the point where currentpoint has form „ji”. A version of the
algorithm presented in the program below is rather complex because it performs the transposition as a sequence of elementary transpositions. If we write „ij” in binary as „i1 i2 i3 j1 j2 j3”, then we can perform 3 elementary transpositions „i1 j1” -> „j1 i1”, „i2 j2” -> „j2 i2”, „i3 j3” -> „j3 i3” to transfer data from „ij” to „ji”. Furthermore, the transposition of a „binary” matrix
0 —- 2
| |
| |
1 —- 3
can be performed in two steps:
0 —- 2 1 —- 3 0 —- 1
| | t(1) | | gtrans(3) | |
| | | | | |
1 —- 3 0 —- 2 2 —- 3
Such a partitioning causes that the whole transposition is realized by means of the transfers t(g), where g are generators. You can count the total number of operations t(g) executed and compare it with the sum of w(r) for all t(r) that would be used in the basic „ij” -> „ji” version of program, with r being arbitrary vectors. This second version would be more efficient (if we do not consider possible losts during routing … ).
#include
#include
#include
#define ROWBASE 56
#define COLBASE 7
void transpose(float*,int,int);
float a;
void main()
{
xpand(ROWBASE^COLBASE);
a= currentpoint;
transpose(&a,ROWBASE,COLBASE);
printf(„point %d data %f\n”,currentpoint,a);
o();
}
void transpose(float *a, int rowbase, int colbase)
{
static int r,c;
float A;
r= fgoss(rowbase); /* get first generator of subbase */
c= fgoss(colbase); /* get first generator of subbase */
A= *a; /* pick up matrix element */
while (r) {
t(r); /* do elementary */
gtrans(r+c); /* transposition */
r= ngoss(r,rowbase); /* get next generator */
c= ngoss(c,colbase); /* get next generator */
}
*a= A; /* replace matrix element */
}
9. Asynchronous rules
————————-
Although ant programs are interpreted by synchronous
computation, they can be run asynchronously.
The only requirement is that the results of both
intepretative and real computations be the same. In order to understand
the relationship between these two ways of obtaining the same result
we need the following definitions.
An action is defined as a sequence of operations of an ant,
ended with a moving operation t(), tx() or o(). Each action
is supposed to take one time tick in synchronous computation.
Thus, a synchronous computation can be redefined as a sequence of steps
where each step consists of a set of actions which are performed
simultaneously by all ants. Each step takes one tick of time.
A computational diagram is a graph in which each node
represents an action. Two actions are connected by a dashed edge
if they follow one another immediately and if they take place
in the same point of space. Two actions are connected by a full edge
if they follow one another immediately and if they are the actions
of the same ant.
To illustrate these notions, let us consider a sample program
to compute coeficients of the polynomial (1+x) to the 5-th power
in the 8-cycle.
#include
#include
int a,i; /* point variables */
void main()
{
int A; /* ant variable */
xpand(7); /* expand to all points */
a= 0; /* set zero in all points */
if (currentpoint== 0) a= 1 /* except of initial point */
for (i= 0; i< 5; i++) { /* do it 5 times */
A= a; /* pick up number */
gtrans(7); /* shift in the 8-cycle */
a= a+A; /* add numbers */
}
printf(„coef. at %d has value %d\n”, /* print result */
antigray(currentpoint),a);
o(); /* quit */
}
The main part of the computation begins after the initial
expansion to all points. The great deal of actions of the main part have
the following form:
a= a+ A; /* not present in initial step */
A= a; /* not present in last step */
gtrans(7); /* not present in last step */
As we see, the initial and final actions look a little differently
from those in between.
The computational diagram graph is as follows:
\ \ \ \ \
point= 0 o—o—o—o—o—o
\ \ \ \ \
1 o—o—o—o—o—o
\ \ \ \ \
2 o—o—o—o—o—o
\ \ \ \ \
3 o—o—o—o—o—o
\ \ \ \ \
4 o—o—o—o—o—o
\ \ \ \ \
5 o—o—o—o—o—o
\ \ \ \ \
6 o—o—o—o—o—o
\ \ \ \ \
7 o—o—o—o—o—o
\ \ \ \ \
time= 0 1 2 3 4 5
The symbols 'o’ represent actions. The individual columns
represent steps. The actions linked by dashed edges are performed
in the same point. The back-slashes \ , i.e. full edges, represent
transfers of ants from point to point.
A computational diagram can be viewed as a data flow diagram in which
point data flow along the dashed edges —, and ant data
flow along the full edges \ .
If we place initial numbers 1,0,0,… into the nodes of the
first column of the above graph and then subsequently
perform additions in columns that follow, we obtain 1,1,0,0,…
in the second column, 1,2,1,0,0,.. in the third column, etc.,
and finally 1,5,10,10,5,1 in the last column.
This is exactly how the program in intepreted. The result are numbers
known as the Pascal’s triangle numbers.
It is clear that individual operations
in nodes of the graph need not be executed so rigidly
„column after column” and some degree of independece can be
allowed in this process. In general, however, a npode cannot fire
an action before data have been prepared on its input. However, as soon as
data are prepared, the action can be fired. This observation gives
the definition of so called local asynchronous rule:
An action in a given node of a computational diagram is allowed to start
if operations immediately preceding it finished. Note that, the operations
in the leftmost column of the graph must be started externally.
A more restrictive is the following, global asynchronous rule.
An operation in a given node of a computational diagram can be started
if all the operations from the previous column of the graph have already
finished. In the global rule, an operation that is one
step ahead of other operations must wait until these operations
reach this step. In fact, this way of asynchronous computation is
very close to that what we call synchronous computation.
Both the rules, to be applicable in the original environment with ants
and spaces, must now be transformed from the computational diagram form
to the ants-in-space form. The idea is to introduce counters
to count the number of actions performed by ants and points.
Actually, these counters will count lengths of the paths the ants
and points route across the graph.
Let us associate a counter C with every ant.
At the beginning we assume C== 0 in the initial ant. In the course
of the computation, every execution of t() and tx() will automatically
increase the values of C by 1. Particularly, both the old and new
ants receive the same increased value of C after executing tx().
The global asynchronous rule can be now redefined as follows.
An ant can perform its operation if and only if its counter C
is minimal of all counters C of all ants currently active
in space. It is clear that the differrence between
the values of any two counters C will never exceed 1.
The ant having the value of C one more than the others ants currently
have must wait until the rest of ants reach the same value of C.
Now we try redefine local asynchronous rule.
We introduce additional counter c, this time for every point of space.
Both counters c and C will serve for counting
the operations t(), tx(). At the beginning we assume c== 0 in
the initial point and C== 0 in the initial ant. During the computation,
every execution of t() or tx() will automatically increase the
values both c and C by 1. Moreover, the ant being expanded into a new point
causes the counter c in this new point to be set to the value of C carried
by the ant, c= C.
We define the local asynchronous rule as follows.
An ant can execute its operations if and only if the value of
its counter C is the same as the value of the counter c associated
with the point where the ant is placed, C== c.
By thorough analysis we can see that this definition
is not exactly equivalent to the local rule defined for computational diagrams
For example, we can’t use this „C==c”
rule to synchronize the computation in which
a single ant is travelling in space.
You can, however, use the local rule
for ant programs, the computational diagram of which has
the form
expansion
permutation
contraction
Hence, it should be a good practice to write programs leading
to such structured computations in order to have the warranty
that the local asynchronous rule produces correct results.
Note that subroutines are usually programmed to perform
the permutational part of the above scheme.
10. Implementation
———————–
We start with the following model of a parallel machine.
Let us have p single processor computers, where p is
a n-th power of 2 and let the computers be numbered 0,1,…,p-1.
Let us connect the computers by communication lines, for example
in the same way as points 0,1,…p-1 are connected in the graph
of the n-cube (see the graph of the 3-cube in the section 2).
The communication lines will serve to transfer „ant packets”.
There is a driver in each computer which is responsible for
sending and receiving ant packets via communication lines.
One layer higher is a kernel which is aimed to schedule ant processes.
We suppose the same copy of kernel, driver and the ant program loaded
in the same area of memory in each computer (…). All drivers
are initially waiting for incomming packets.
The main part of a packet are data taken from ant variables of an ant
program, hence from the program stack. Contained in the ant packet
also is the contents of the instruction pointer register IPR and
the point P= currentpoint + r (and/or R= r, if it is more appropriate
for routing), identifying the destination of the ant packet.
The following auxiliary fields in the packet serve to implement local
asynchronous rule. The field C represents the number of operations t()
and tx() performed. The field X is destined to indicate whether the packet
is the result of expansion tx() or transfer t().
X== 1 denotes the first case, X== 0 denotes the second case.
Correspondingly, the counter c is defined as part of
kernel to store the number of operations t() and tx()
performed by the kernel.
Let us start manually a given ant program at the computer number zero,
which can be qualified as arrival of an initial ant.
As long as arithmetic operations of the program are performed
everything runs locally. When an operation tx() or t() is encountered,
the kernel takes control and prepares an ant packet. So, it
increases C (and simultaneously c), sets X to 1 in case of tx(),
sets the value of destination point P, makes copy of current IPR and stack
and then is passing the ant packet <P, C, X, IPR, stack> to the line driver
queue. The line driver in turn sends the packet to an appropriate line.
Let us see how the incomming packets are processed.
The driver, after receiving a packet (the packet must contain
proper destination point V== currentpoint) puts it into the queue.
While the processor is free, the kernel examines the queue and decides
which packet is to be given processor time. The choice follows
from the local asynchronous rule. If there is no packet in the queue
such that C== c or X== 1, the kernel must wait. If there is such
a packet, the kernel loads ant data from it to the program stack
and IPR of the processor. This will cause the ant program to take control.
Note that the condition X== 1 means the arrival of expanded ant which
is sufficient to make this ant active immediately (the condition
C== c is not tested).
Now suppose we have less processors than the number of points
in space. First consider the extreme case of one processor.
In this case the kernel must be allocating memory
for all the ants being encountered during computation and for
all the point data. Because there are no communication lines now,
every ant packet being created as the result of t() or tx()
will be simply put back to the local first-in-first-out queue.
Observe the order of ants in the queue. The ant which is the first to
output from the queue has the value of counter C less or equeal to
the values of C of rest of ants in the queue. So we can use the global
asynchronous rule to synchronize computations. The counter C can even
be left out from the ant packet, because evrything works automatically,
as well as the field X which is not the part of global asynchronous
strategy.
And this is exactly how the kern.c on your PC works.
Now consider the intermediate case. If we have m points and want
to use k processors, k < m, we have to assign some portion of points to the first processor, another portion to the second processor, and so on. Some interconnection network is assumed, which connects processors appropriately. Applying reasonable symmetry, both to assigning points to processors and to interconnecting processors, we can obtain simple rules for sending packets and managinig them inside of processors. Note, that in this general case the local asynchronous rule must be used.
11. Creation theory
————————
In the Section 9 a computational diagram was defined as a representation of computational process associated with a given program. In this section we consider a reverse problem. Suppose we have a parallel algoritm in form of an acyclic directed graph, where nodes represent operations and edges represent data flow paths. Our aim is to transform this algorithmical graph to a computational diagram, and find a corresponding ant program. The following discussion is rather theoretical, nevertheless very useful, because it reveals possibilities and limitations of the ant method. We start from the analysis of computational diagrams. We observe that computational diagrams have the property to be binary, i.e. at most 2 edges enter any node and at most 2 edges leave any node (see the computational diagram in Section 9). If a given graph of algorithm had not this property we would use transformations of type \ \ – o -> – o – o
/ /
or
/ /
o – -> o – o –
\ \
to get required property. As we normaly work with problems of size
„n-th power of 2”, we usually will be dealing with transformations of
regular trees of type „depth-1 degree-4 tree” to
„depth-2 degree-2 regular binary tree”, etc.
Another important property of computational diagrams is that
any two different paths starting from the initial node
and ending in some another node have the same length.
This property follows from the fact that the ant computation is synchronous.
The procedure of stuffing redundant nodes into paths in the graph
of algorithm will help to obtain the required property.
This is rather theoretical construction which isn’t supposed to be
encountered in case of higly symmetrical diagrams.
The next step is partitioning edges of the graph of algorithm
into two classes: dashed and full. We will need the following
definition. A directionaly alternating path in a binary acyclic
graph is a sequence of adjacent edges in which directions
of edges alternate. Let us find all such paths of maximal
length in a given graph of algorithm. Note that some of them will be
cycles. It follows from the property of „to be binary” that the set of
all maximal directionaly alternating paths is determined uniquely and
the elements of this set are mutually edge-disjoint.
Let us redraw every second edge in each directionaly alternating path
by a dashed line. There are two ways how to do it and we may
choose one of them arbitrarily. Finally, if there is only one edge
entering some node and it is drawn by dashed line, we
draw additional full edge parallel to it. This construction assures
that every node of the graph is connected to some initial (leftmost)
node by full-line path. We observe this property in computational diagrams.
Its meaning is that every operation in the graph is reachable
and executable by some ant.
By partitioning nodes into columns (similarly as you see it on the
computational diagrams in Section 9) we obtain synchronous time. The procedure for
determining space is as follows. First, every dashed-line
edge of the graph should be contracted so that corresponding two nodes
glue together. We obtain a homomorphic image of the computational diagram,
in which nodes represent points and the
full-line edges, that have remained after contracting,
represent trajectories of ants.
This graph, can be treated as a certain
approximation of space. Instead of studying what
algebra would fit the best for this graph, we go immediately
to the procedure of embedding the graph into a cube. Thus, nodes
of the graph are to be mapped to points of a cube in some way.
As a consequence, edges of the graph will match automatically some
vectors. From the variety of all possible mappings we must choose the best
one. This resembles the very old problem of coding states
of finite automata, where states are to be assigned n-tuples
of 0’s and 1’s, the best of all in such a manner that any transition
from state to state would change just one bit. In case of success,
the Gray embedding is obtained, i.e. vectors become generators.
Anyway, some algorithm optimalizing switching functions
is to be used to obtain good results.
The following example explains the rest of the procedure.
Consider the polynomial evaluation by means
of the Horner algorithm. For the polynomial
p(x)== a3*x*x*x + a2*x*x + a1*x + a0 we can write
p(x)== (a3*x + a2)*x*x + (a1*x + a0), and in graphical representation
a0,x o
\
o
/ \
a1,x o \
o
/
a2,x o /
\ /
o
/
a3,x o
Every node, except of the leftmost four, represents two operations
u + v*x and x*x . The result is put on the leaving
edge as the following picture shows.
\ a,x
\
\ a + A*x , x*x
o —————–
/
/
/ A,X
The graph of algorithm is already binary and „synchronous”.
So we can go on to find directionaly alternating paths.
There are 3 such paths, and we redraw the original graph using
two kinds of edges.
o —– o ——– o
/ /
/ /
/ /
o /
/
o —– o
/
/
/
o
time= 0 1 2
Now we contract the horizontal edges, and obtain the graph of ant
trajectories, or „state transition graph”, in the terminology of
finite automata. We assign the binary numbers 00, 01, 10 an 11 to
the nodes and obtain the following graph:
11 10 01 00
o ——> o o ——> o
time= 0 \ time= 0 /
g= 01 \ g= 01 /
\ /
—->-
time= 1
g= 10
Using well known methods of the finite automata (or switching
circuits) theory you can find Boolean expressions for
moving ants in the 2-cube. An initial version of a program for
evaluation of polynomials will be a simple sequence of transfers, quits
and arithmetical operations. After some refinement you obtain the
following program:
/* 4 ants in 4 points */
A= a; /* copy polynomial coeficient to ant */
X= x; /* copy argument to ant */
for (g= 1; 2; g++) { /* make generator a control variable */
if (currentpoint & g) t(g); /* condition when to transfer */
else o(); /* and when to go out */
A= a= a + A* x; /* main operation */
X= x= x*x; /* new power of x */
}
/* single ant, A (== a) contains result */
Thus, we have seen that the theory of program synthesis,
assigning data to points, etc. is fairly simple. At the same time,
this theory is very hard to apply due to the large combinatorial
complexity.
11. Epilogue
————–
There is lot of interesting things about ant programs. For example,
as you may have noticed, there is a duality of ants and points.
Duality is particularly visible when drawing
every second edge in directionaly alternating paths
by dashed or full lines. There are two choices there. This is the place
where we are deciding who will be carrying which data
– ants or points. You can obtain a dual program from a given one
by exchanging local/capital variables for global/small letters
(this is just a general rule, do it carefully).
You will probably find out that the method is sometimes
dificult to apply. The most often it will be due to enormous
complexity of a parallel algorithm you want to program.
However, there may occur real difficulties with the ant method itself.
I believe that such difficulties can be overwhelmed by implementing so called
parallel multiprocessing (or multiprocessing parallelism). Normaly
an ant program governs one „cloud” of ants – a single parallel process.
The ant multiprocessing means that more than one such processes
coexist in a cube. You will have simply two or more ant programs active
simultaneously (the programs usually will be inside one source file).
Particularly interesting is the case when ants from one program
can communicate with ants from another program, e.g. by means
of messages left in points. Thus, there will be a statement „write_msg”
somewhere in the first program and a statement „read_msg”
somewhere in the second program. The first program can be
considered as a producer, the second one a consumer.
Currently I have no implementation of the above idea.
I must also say that I spent years for unproductive
meditations and trials in utilizing the ant and cube ideas to explain
strange behaviour of nature in micro and macro cosmos. I still believe that
there is something in that. Stochastic approach is inevitable.
Particularly exciting is to imagine a big glass bottle filled with
particles of two kinds (points and ants, or vice versa), which collide
accidentaly but selectively. You don’t need any a priori space and time,
everything is created automaticaly by the selective rules of collisions
(recall the „creation theory”).
Finally, references to my old papers:
– A New Approach to Parallel Computing, Acta Informatica, Vol. 7,
1-13, 1976 /* ant method, programming examples, synchronous
implementation */
– The Universal Space for Parallel Computations, Information
Processing Letters, Vol. 6, No. 2, 42-45, 1977
/* using Gray coding for embedding spaces (originally „immersion”) */
– Implementace takzvanych mravencovych programu, SOFSEM’82
Proc., 413-417, 1982 /* asynchronous implementation */
– Souvislost mezi dvema zpusoby popisu paralelnich
algoritmu, MOP’86 Proc., 145-153, 1986
/* creating programs from computational diagrams */
Orlova, July 4, 1993