Article Summary

Coevolving High-Level Representations

Peter J. Angeline and Jordan B. Pollack

Appeared in: J. B. Pollack's Page (or go straight to the article) (April 27, 1993)

Summarized by: Iddo Lev

Note: This summary is submitted as a requirement in a seminar on Artificial Life and Genetic Algorithms. It is based on but does not entirely adhere to the article. The order and scope of representation have been changed for the sake of clarity, as I see it.

 

1. Introduction

The article addresses the issue of the level of complexity of the genotype's representation. In most genetic algorithms, the level of representation remains constant during the evolutionary process, and so does its expressiveness. The article suggests that the level of representation should be increased concomitantly with the evolution of the genotypes.

 

2. Some Aspects of Representation

There are some aspects of the genotype's representation that we should consider:

Evolutionary Programming

A sub-field of genetic algorithms is evolutionary programming, in which the evolved "creatures" are computer programs, written in some programming language. In these systems, the genotype is actually the code of the program. Usually, the genotype itse lf is of little interest, since an evolved code tends to be a collection of commands, ordered in some unusual way, and the program's behavior cannot be understood from reading its code. The program's behavior - the phenotype - can be evaluated by running the program, and the achieved performance determines the program's fitness level.

We can say that the representation of the genotype does not merely encode static genetic material, but rather, that it is executable. The idea suggested in the article is presented for this kind of representation, but it can be generalized to other ki nds of genetic algorithms, as we shall see.

The Genotype's Length and Structure

Usually all the genotypes in the population are of the same fixed length, which does not change in evolution. This can pose an upper limit on the number of different possible genotypes and on the complexity of the evolved behavior, because usually in such a case, complex interaction between different sections of the fixed-length genotype cannot be implemented, and the mapping from genotype to phenotype is straightforward.

An alternative is to allow the length of the genotype to change during evolution (e.g. by crossover operators that concatenate pieces of different lengths from the parents' genotypes). In the article they call this alternative "dynamic genetic algorit hm".

Evolving the Representation Language

In evolutionary programming, the programming language, in which the genotypes are written, usually does not change. The language contains a fixed set of primitives, and the genotypes can only be composed from these primitives. This static expressivene ss poses a limit on the level of complexity that can evolve, as will be explained in the following section. The alternative is dynamic expressiveness: to let the representation language evolve into a higher-level representation, thus increasing its expres siveness and allowing more complex programs to evolve. This idea can be generalized to other kinds of genetic algorithms.

 

3. Motivation for the Evolution of the Representation Language

The problem we are faced with in evolutionary programming is this: On the one hand, if we use a low-level programming language that contains only simple primitives, then the genotypes ought to be long if they are to exhibit a desirable behavior (i.e. solve the problem we are interested in), and thus are difficult to evolve in the environment. On the other hand, if we use a high-level language, with complex functions tailored to the environment, then indeed short genotypes will suffice and will be able to evolve. But this begs the goal of evolving the desired behavior, since sufficient representational complexity is provided a-priori in the language, and is not induced by the environment. Moreover, tailoring the language to the environment leaves the e volutionary process susceptible to over-design and lack of diversity.

The solution suggested in the article is to evolve the representation (programming) language concomitantly with the evolution of the programs. We start with a simple language containing a minimal set of primitives required for dealing with the given p roblem. Then in the course of evolution, we create new modules and add them to the language. The new modules are composed of the primitives and previous modules already in the language. In the course of the process, higher-level more expressive modules ar e formed, and thus programs that are short yet exhibit a desirable behavior can (potentially) evolve. Moreover, the new modules become suited for the problem environment as a result of the process instead of being pre-designed by us. The exact manner by w hich this goal is accomplished is explained in the following section.

 

4. The Genetic Library Builder (GLiB)

GLiB is an evolutionary programming system for Lisp, with the possibility of defining primitives required for a specific problem. The genotypes are the syntax-trees of expressions in the language.

The system uses the following genetic operators:

Crossover

Exchanges randomly selected subtrees between two genotype parents thus creating two new offsprings.

Compression

Compresses a random subtree of a genotype into a new module that is added to the language. When the depth of the selected subtree is smaller than a user-specified max-depth, the entire subtree is removed from the original genotype. The subtree is turn ed into a new module with a unique name, and a call to this module is inserted in place of the subtree in the genotype. Thus when this genotype is evaluated (executed), its behavior remains the same. If the depth of the subtree is greater than the max-dep th, only the upper part which falls inside the allowed depth is turned into a module, but this time the module is defined like a function with arguments - there are as many arguments as there were branches that exceeded the max-depth. The subtree in the g enotype is replaced with a call to the new module, binding the branches to the module's arguments. Notice that the code of a new module might include a call to a previously evolved module (if it appeared in the genotype).

Expansion

The problem with the compression operator is that it removes genetic material from the population that would otherwise be available to the evolution. After several generation, the remaining material is insufficient for advancement, and the process rea ches a local sub-optimal solution. To offset that, the complementary expansion operator is provided. It randomly selects a compressed module from a genotype, and expands it into the original subtree (but only in this specific genotype - calls to this modu les in other genotypes are not affected).

The Effect on Evolution

At any moment, the language is considered to be the set of modules that are called by genotypes in the population. If all the genotypes that call a certain module get extinct, then we say that that module is removed from the language. Compression and expansion do not affect the behavior of a genotype, they only extend the language. However, since these operators do change the body of the genotype, they change the effects that the mutation operator will have on the population. A discussion of the meani ng and significance of these operators appears in the last section of this summary.

 

5. Experiments with GLiB

This section briefly summarizes the two experiments outlined in the article.

The Tower of Hanoi Experiment

In this experiment the goal was to evolve a program that could solve the Tower of Hanoi problem (with 4 disks and 3 pegs).

The primitive language for the genotypes included the 6 possible moves of disks from peg to peg. The fitness function evaluated a genotype at the end of each turn by running it at most 32 primitive steps, where an illegal or impossible move was n ot executed but was still considered a step. genotype received points for each disk it succeeded to move to the last peg, and the points awarded for a disk were proportional to its size.

A consistent property of the evolved modules was that they were applicable to several intermediate problem configurations, and not only for one. Broad applicability gives an advantage to a module by guaranteeing its survival in the population but also improves the evolutionary process by increasing the expressiveness of the language and its suitability for the problem environment, and hence also the possibility of adequate programs evolving.

Experiments with Tic Tac Toe

In this experiment the goal was to evolve a program that could play Tic Tac Toe against a pre-programmed "expert" that never loses unless forked, and which is also designed to improve by learning what his adversary does when there isn't any one best m ove to choose from. This makes the evolved programs more robust.

The initial language included primitives for the positions on the board, predicates to test the status of a cell, logical operators, and a function for positioning a piece on the board. At the end of each generation, the fitness function runs each gen otype 4 times against the "expert" and gives it a score which is the average score over the 4 games (in each game a legal move gives 1 point, blocking an immediate threat gives 2, a draw gives 4 and a win 12). An evolved program never runs into an infinit e loop, but it might fail to make a legal move or any move at all.

The best evolved program achieved a score of 16.5 points. The code of the program had a maximum depth of 13 with 15 calls to modules. After expansion, it was revealed that in all, the program used 43 distinct modules in 89 calls. The virtual size of t he program was 477 nodes with a maximum depth of 39. It would have been much more difficult to evolve such large and complex programs directly, without the use of modulization.

When the authors attempted to analyze the functionality of the evolved modules they discovered that no one was responsible for checking whether there is an opportunity for an immediate win, or a possibility for an immediate loss which must be blocked. The usage and semantics of the evolved language was extremely non-standard, e.g. several modules encoded numerous side effects in the test position of a conditional. They also examined the graphs of the number of calls to a module versus the generation, and saw that some modules where quite rare in the population in the initial generations, and later on "caught up" and where used much more frequently, while other modules spread quickly at first, but later the number of calls they received decreased drama tically, possibly due to a preferred module appearing in the population.

 

6. Discussion and Conclusions

Randomly compressing subtrees from the genotypes does not guarantee useful additions to the language. It is desirable to have a criterion to assess the value of a module, and that the environment will encourage the survival of the useful modules only. Such a criterion is the number of calls to a module in a generation, because if the population uses a module frequently then it must have provided some consistent beneficial behavior in previous generations. Survival of appropriate modules according to t his criterion is done automatically in the environment: a useful module will tend to increase the fitness of the genotypes that call him, thus they will have better chances of survival and reproduction to the next generation, and the module itself will in turn survive better since the offsprings of these genotype will also call this module. In a similar manner, the number of calls to an non-useful or harmful module will tend to decrease in the population, until the module will (possibly) become extinct.

The authors claim that it is clear from the above and other experiments, that the system is able to capture modules that are beneficial to the construction of complex programs, and which reflect the unique constraint imposed by the environment. Howeve r, they do not show in this article how the system works in these examples without the compression and expansion operators.

The general idea behind the method is that the application of the compression and expansion operators give an iterative refinement to the language, such that later modules are of a higher-order complexity and functionality since they are composed of l ower building blocks, which were already found useful to the process. In particular, compression decreases the average size of the genotype, and this "forces" the evolution to concentrate on improving the evolved programs in a higher-level manner instead of destroying useful pre-discovered building blocks, which would have happened if the mutation operator could work on every part of the program (including its subroutines). The language and the genotypes coevolve until the language is expressive enough to succinctly encode the desired behavior.

The compression operator allows the evolution to restrict changes to unsettled or unprofitable sections and "protect" useful parts of the genotype, and propagate these sections to its offsprings. This idea could also be extended to other types of gene tic algorithms.

 

Back to the seminar's home page.

Written: July 6, 1998.