How AI can Learn from Genetics

Coding a solution for the 8-queens puzzle with Genetic Search Algorithms, AI algorithms that mimic how our genes are passed and expressed.

adam dhalla
Artificial Intelligence in Plain English
12 min readFeb 3, 2021

--

James Watson and Francis Crick first published their groundbreaking study on the double-helix form of deoxyribonucleic acid (DNA) and how it expresses our genes, which up until then, were mere abstractions lacking a concrete, biological, definition.

Genetics as a Problem-solving Technique

Since then, we have discovered how exactly amino acids, and in extension, proteins, form from different combinations of nucleotide base pairs ordered along our DNA, which resides in the nucleus of all our 3 trillion non-red blood cells.

The entirety of our DNA is a single string of nucleotide bases Adenine, Thymine, Guanine and Cytosine (A, T, C, G)* which, in trios (e.g. TCA) code for amino acids. These amino acids chain together to form the proteins that build up all integral systems and infrastructure in our body.

The breathtaking beauty of our inner galaxy. “Hair cells: the sound-sensing cells in the ear” // ZEISS Microscopy under CC BY-NC-ND 2.0

Furthermore, we have learned more about how these strings of genes are compacted (into chromosomes) during the process of cell division. In meiosis, the version of cell division that happens exclusively between the male and female sex cells, the male and female chromosomes from the sperm and egg engage in a beautifully intricate dance that has self-propagated for two billion years.

One integral part of this meiotic process is crossing over, which occurs when a pair of chromosomes, one from the father and one from the mother, recombine parts of their chromosome (and thus their genetic code).

Crossing over between sister (corresponding male / female) chromosomes. Christine L. Miller / CC-BY-SA-4.0

This crossing over occurs in all sister chromosome pairs, creating the genetic foundation for the offspring of two parents, and guaranteeing that the offspring is genetically heterogenous from either of the parents.

This trait gives sexual reproduction a distinct advantage over asexual reproduction. The ability for populations to create genetically diverse offspring increases the group’s resistance to disease and hardship, but also allows the species to more readily evolve. Genetic diversity allows many chances that some organisms have traits others don’t, which natural selection can choose and emphasize over generations.

Thus, I encourage you to think of genetics, and sexual reproduction, as a problem-solving algorithm. Recombination of genetic material between pairs allows for more chances of improvement in the species’ phenotype. Sexual reproduction, in a sense, allows for a wider search for possible solutions or mutations to certain problems a species faces — as opposed to asexual reproduction, which is fast, consistent, and slow to change.

Once this is understood, it is easy to understand Genetic Search Algorithms, which search for solutions to problems in a similar way to the processes I just described.

Genetic Search Algorithms

It is best to understand genetic search with an example. A classic example is the n-queens problem, more specifically, the 8-queens problem.

the 8-queens problem asks a simple question. How can we place 8 queens on an 8 by 8 chessboard so that the 8 queens cannot attack each other?

One of the possible solutions to the 8-queens problem. by Encik Tekateki under CC BY-SA 4.0. Edited.

Remember that a queen can attack any piece in it’s row, column, or either diagonal through it. This problem can be extended to (N x N) chessboards and N queens.

Our goal is to see if we can employ techniques that echo genetics to automatically solve the 8-queens problem. This type of problem falls into a type of artificial intelligence known as search. The reason why this is a search problem becomes clearer shortly.

We must first find a way to represent a state on the chessboard. That is, how do we take an image like the one above, and represent it in a compact way that we can tweak, change, and evaluate?

The numbers in each place represent the vertical position on the board in the corresponding column.

The same problem above can be represented as a string of eight numbers. We can take it as a given that in any solution, two queens cannot share a column (as they could attack each other vertically). Therefore, we can break down the state into an easily adjustable string of numbers.

In an 8-queens problem, there are 16,777,216 (8⁸) possible ways to represent this string of 8 numbers. If we didn’t limit it with only one queen per column, our amount of possible states rises to 4,426,165,368 (4.4 trillion).

There are 92 distinct solutions to the 8-queens problem. How do we find one of these 92 in the 16 million possible states? If we were guessing, our chances would be approximately 1 in 180,000. The chance gets smaller with larger boards.

Before we begin to find the best 8-number strings of numbers, we need to define to the computer what “the best” means. We, of course, have an abstract idea — the better a possible state, the closer we are to having 0 attacking pairs of queens. We need to make this concrete and numerical, and have a consistent way for our algorithm to return a “penalty” when passed an 8-number string. This is called the heuristic function.

The Heuristic Function

How do we quantify how ‘bad’ our current state is? There are many different heuristics we can choose from, and they all depend on problem to problem.

For this problem, we can make our heuristic return the amount of different pairs of queens which can attack each other. Let’s take an example of a game board and see how we could score it.

A solution would have h(s) = 0. Therefore, the closer a state’s h(s) score is to zero, the closest it is to being a good solution. We have successfully created an algorithm to grade any given chess board.

We can code something like this in Python — the hardest part is coming up with a way to detect any diagonal matches.

This code takes in a list like [5, 1, 2, 4, 5, 1, 2, 1] and N, which is 8 in our case. This heuristic is adaptable to larger or smaller boards.

Now that we have our heuristic, we can deeply explain how the genetic/ evolutionary algorithm works, and then dive into the code.

The Initial Population

We begin with a completely random set of N-dimensional lists like [1, 5, 2, 3, 5, 1, 2, 5], which we can set. The more different each of these random sets are, the better.

Remember, each one of these corresponds to a certain gameboard with certain queen placements.

We can code this algorithm in Python similarly. Our function takes the inputs N, the dimension of the chessboard, and initialpop , the amount of initial states you want to include in your population. This initial population will sustain throughout the algorithm.

We then run our heuristic function on all of these initial states, and note down the score of each state.

We can then sort our initial population from the lowest heuristic score to the highest, creating a priority queue based on our heuristic score. We can then get into the genetics of the algorithm.

Selective Breeding of Best States Based on Fitness

We can say the states with the lowest heuristic score are the most ‘fit’. We can think of each state having a certain fitness based on their heuristic.

Now, we have a priorityQueue with all of our states, in order of the best to worst. We can now selectively choose the k best states to let them continue through the algorithm. Looking ahead, we will let these best states ‘breed’ in a way that resembles crossing over, and eliminate the worst-scoring states.

First, let’s select the best k states. Let’s say that k here = 4. Survival of the fittest in the most literal sense.

Now what do we do with these four? This is where recombination (crossing over) comes in. We need to think of each of these 8-number arrays as something reminiscent of a string of nucleotide bases that are present in a gene.

Similarly to an actual gene, they are an abstraction of a more complex state. Our number array represents an entire chess board, and the string of bases represents amino acids that build up our body’s essential infrastructure.

We can now continue.

Pairing Off

Now that we have our k best states, we can pair them off. Although there aren’t always two ‘parents’ in an evolutionary algorithm, it is certainly the most common method in both computing and nature.

When pairing off our k states into parent groups of size p (here p=2), we pair randomly, and non-exclusively. One ‘gene’ can be completely missed, or one can be paired twice.

We now have two different groups. We can code this, and the previous elimination step, into a function that takes in N , the dimension of the board, candidates , a list of state arrays, amountofpairs , the amount of pairs we want, numofparents , the number of parents we want per group, and heuristic , our heuristic function.

This function returns a list of lists of lists — something like:

[[[1, 2, 3, 4], [1, 4, 3, 2]], [[1, 4, 5, 6], [5, 2, 1, 2]]]

Which would represent two parent groups when N = 4.

Now that we have parent groups, we can combine various traits from each of the parents to produce offspring, which hopefully incorporate positive aspects from their parent’s “genotype”.

Crossing Over

Now, starting with our parenting groups, we do something directly inspired by the crossing over of chromosomes during meiosis.

We take each of our parents and create an offspring that inherits certain parts of the genome of it’s parents. The length of these parts is random. Looking at an example makes this much more clear.

Since each parent’s offspring has a random amount of it’s parents individual genomes, a parent can have a variety of kids, each with different genomes.

It’s important to keep the big picture in mind — this offspring represents a game board that incorporates it’s first five queen positions from it’s father, and it’s last three from it’s mother.

This idea also carries significance of being a helpful way to gauge whether an evolutionary algorithm is the right fit for a problem. The only reason we can hope this will help bring us closer to a solution is that chunks of our answer can be independently useful. We know this works since maybe the first five queen position’s in the father’s genome are a “good” combination, and are good independently.

If each of our numbers were completely unrelated, our crossing over would be as good as random shuffling — useless.

Now, back to our algorithm. We can take our two best parents and make them produce four children each, giving us 8 total children.

The reason why we want 8 children to be the outcome is because I want the population to be consistent. Remember that we started off with 8 initial genes — we don’t want our total population to go down in size over time, since we will end up with smaller and smaller population sizes that will eventually become 0. If we keep the population size always at 8, we will never have too little or too many genes to properly pair off.

We can code both the act of combination and pairing with two functions. First, the combination function, which only requires a single parent group as an input (something like [[1, 2, 4, 2], [1, 2, 5, 4]])

We can then create a function that uses the crossover() function that creates as many offspring with the parent groups we give it. This function takes a list of parent groups like [[[1, 2, 3, 4], [1, 4, 3, 2]], [[1, 4, 5, 6], [5, 2, 1, 2]]] , numoffspring , the wanted amount of offspring (in total), and again, N , the dimension of our board.

Now, we have an output population of 8 offspring that were derived by taking the best of the initial population and recombining them. There is just one more step that we must do on these 8 before going back through the cycle again.

Mutation

The biggest source of evolution in nature is genetic mutation — the chance miscopying or error that results in a novel gene in an individual, unknown in it’s ancestors.

Natural selection works by enhancing or suppressing certain mutations, through generations, depending on if they help or hinder the organism that they occupy, which leads populations to speciate.

We must apply this same design philosophy towards our offspring. We must open a small chance that some of the numbers in each of the arrays will be altered. Without this, our search will be quite “narrow” — and will have a high probability of never converging on an answer, since our search will be, for lack of better words — inbred.

Our mutative function takes in a population of genes and randomly alters some of the entries to a randomly chosen number.

We can write two functions — one, our base function, which randomly mutates (or doesn’t) a single gene. Our input is a gene in the format of a list [1, 2, 3, 5, 2, 1, 2, 4].

We can then apply this to a superfunction that applies this to an entire population and outputs a population with randomly mutated individuals. This function takes N , the dimension of the board, and population , a list of our genes.

The output of this function is our final population at the end of the loop. We can now reiterate the entire algorithm — we can now take the best k individuals from our mutated population, pair them off, recombine, et cetera.

Search Loop

We can combine all these functions together into a final loop. We must first define a single function that checks an entire population (here, 8 individuals) to see if any of them are an answer (if any of their heuristic function’s values are 0). If there exists an answer, the function returns an answer.

The checksolution() code requires N, listofgenes (list of all genes), and hfunc , the heuristic function to measure success by.

Now, we can incorporate all prior functions in a final loop which continues either until the user-inputted iteration limit is reached or, hopefully, when an answer is found.

The amount of parents, population size, and k all effect the chance that an answer will be found. An example call to this function would be:

I find this specific configuration converges on an answer to the problem nearly every time — setting to stop point to 200 practically guarantees it.

For a more general overview about how we’ve learned from nature in building AI, read my article on a Darwinist view of AI:

The success of an algorithm directly inspired by evolution and genetics reflects just how much we have to learn from the 4-billion year old life history of our planet. Biomimicry in computing is a growing field. More and more sophisticated learning algorithms are borrowing ideas from nature are finding success. We have a lot more to learn from Mother Nature, the original computer scientist.

Adam Dhalla is a high school student out of Vancouver, British Columbia, currently in the STEM and business fellowship TKS. He is fascinated with the outdoor world, and is currently learning about emerging technologies for an environmental purpose.

Follow his Instagram, and his LinkedIn. For more, similar content, subscribe to his newsletter here.

--

--

17 y old learning about machine learning, as well as a lifelong naturalist. Climate activist in Vancouver. Writer. Visit me @ adamdhalla.com