Genetic Algorithms


Whereas neural networks are based on the idea that you can build a computer to emulate the behavior of the human brain, genetic algorithms are based on the idea of human evolution. Humans have evolved over billions of years to be able to perform complex tasks, although the changes in the abilities of humans (and their evolutionary ancestors) from one generation to the next have always been relatively small. The idea of genetic algorithms (sometimes called genetic programming) is that a computer program can evolve from one generation to the next. Basically, you take an algorithm which solves a problem, but not especially well, and you evolve it gradually until it solves the problem very well.

Natural evolution is based on the idea of reproduction of the fittest (not necessarily survival of the fittest). Within a given population of humans or animals, the individuals most successful at dealing with whatever environment they find themselves in will tend to reproduce more than those who are less fit. That will result in these more "fit" genes being passed on to the next generation in greater numbers. Gradually, from one generation to the next, more and more fit individuals are produced. Natural evolution also involves mutations from time to time where a given gene will randomly mutate into something new-something different from that found in either parent-which will then have an opportunity to test its fitness in the environment.

Genetic algorithms are similar although they are a little different. Whereas natural evolution tends not to have a obvious goal in mind (we'll leave aside the question of whether it operates according to some divine plan), genetic algorithms are designed to solve a particular problem well. Thus, to set up a genetic algorithm, you need to provide a way of encoding all possible solutions to that problem. For example, if the problem is to make investment decisions for a client, then the set of solutions would consist of all possible investment decisions which they could make at a given point in time. Next, you need to provide an initial population of potential solutions to the problem. Remember, this initial population need not represent especially good solutions-it is just a starting point from which the algorithm can begin. For example, for investments, if there are 5,000 possible securities the client could invest in, the initial population might consist of 5,000 solutions-each of which represents to invest 100% of the client's assets in one security. Not one of these 5,000 solutions is an especially good solution-it is doubtful whether the client would want to invest all their money in any one particular security. But it is a starting point.

Then you need some kind of fitness measure-some way of evaluating how good a particular solution is. For the investment scenario, the fitness measure would probably combine information about the client's financial goals and risk tolerance with the knowledge of an expert as to possible scenarios as to the likely outcome of investment decisions. For example, for a client with low risk tolerance, investing all their money in a single, fairly safe, security is going to be better than investing everything in a dot-com. Diversification is probably going to be better still. One needs some way to automatically calculate this fitness measure.

Then, you need some way of combining pairs of solutions into a new solution. Remember that genetic algorithms are based on evolution. In the process of evolution, in each generation pairs of individuals pair off to produce children. A similar process happens in a genetic algorithm--this is the "reproduction" phase of the genetic algorithm. Pairs of solutions "pair off" to produce a new solution. There may be a degree of randomness in this process. For the financial example, a simple reproduction algorithm may be to simply average the portfolios in each of the two solutions. In practice, a more sophisticated algorithm may be required. Then, there needs to be some way of introducing mutations-random changes to the children solutions-into the algorithm.

These are the important ingredients to a genetic algorithm: the data structure for encoding solutions, the initial population, the fitness measure, the reproduction algorithm, and the mutations. Each of these will require significant design effort for a given application-it is impossible to say in a general sense how to go about this design process. It will require a detailed understanding of the specific problem to be solved as well as knowledge of how successful genetic algorithms were built for similar problems in the past. Once the basic ingredients are in place, the genetic algorithm will proceed roughly as follows:

An initial population of, say, 5,000 individuals (it need not be 5,000-that is just an example-although it should be a large number to ensure reasonable diversity in the "gene pool") is provided to the algorithm. These are paired off into 2,500 "couples" and each couple has 4 children, producing 10,000 potential members of the new generation. Mutation is then allowed to randomly change the genes of each of these 10,000 children. Of these 10,000 children, the best 5,000 are selected using the fitness measure, and these 5,000 then form the next generation. The process is then repeated for a large number of generations-it may require hundreds or thousands of generations. Finally, once a sufficiently "healthy" population is arrived at, the best individual in the final generation is declared the "winner" and is the solution to the original problem which the genetic algorithm recommends. What does it mean for a population to be "healthy" enough? That's another parameter which needs to be defined on a case-by-case basis.

The biggest advantage to genetic algorithms is that they allow you to solve virtually any type of "optimization" problem-even when you don't understand how to formulate a correct solution-in an automatic fashion. The challenge in using genetic algorithms is that they often involve a lot of computation time and may, for difficult problems, even require massively parallel architectures or supercomputers. Another challenge is that because of the random elements in certain parts of the process, it may make people nervous because they want certainty and guarantees of a particular outcome. This challenge can be answered by observing that all human and natural processes involve a degree of risk-a degree of randomness. By focusing on the best solution produced after thousands of generations, genetic algorithms can substantially reduce the level of risk involved.

More information about genetic algorithms may be found at: http://www.genetic-programming.com and http://www.genetic-programming.org. The book An Introduction to Genetic Algorithms by Melanie Mitchell is also highly recommended.

Next Edition: Data Mining


Home: Ramalila.NET



All copyrights are maintained by respective contributors and may not be reused without permission. Graphics and scripts may not be directly linked to. Site assets copyright © 2000 RamaLila.com and respective authors.
By using this site, you agree to relinquish all liabilities and claims financial or otherwise against RamaLila and its contributors. Visit this site at your own risk.