Artificial DNA

Naturally, in a game like Species, the genetic code and the format of the genetic code is going to come up. Of course the player is going to want to see and manipulate the genetic code directly. But that means giving the creatures a genetic code to manipulate in the first place. That’s… well, it’s a bit harder than it sounds…

Here in reality, DNA strings are made out of four molecules: A, T, G and C.

[/pedant: technically DNA strings are made out of pairs of these molecules (AT, TA, GC and CG), which form the double-helix ‘ladder’ shape we’re all familiar with, but it’s easier to represent the code by just reading the molecules up one side of the ladder]

Now here’s the first place people get caught up: DNA isn’t a “code”. It doesn’t really abstract or represent anything: it’s just a long string of molecules, one after the other. When a transcription molecule runs along the edge of the string, it transcribes each set of 3 molecules (codon) it comes to into basic amino acids through molecular processes, and those acids later go on to form proteins.

We often use human terms like “code” and “compile”, “instruct” and “transcribe”, a lot when we’re talking about DNA, and those terms are often misused by anti-evolutionists to imply that these structures had to have been designed. This is purely a semantic argument, but it can be a convincing one: these terms all imply slightly more than what they refer to. The thing to remember is that they’re analogies. The transcription molecule isn’t reading a codon into memory and outputting an acid based on that code: it’s simply reacting to the molecules it comes across, and that reaction happens to result in a specific amino acid.

As one example of a difference, genetic code is a remarkably poor and inefficient way to store amino-acid information, even before we hit the whole “junk dna” issue. Codons have 3 characters, so there are 64 codons. But there’s only 20 amino acids. This means that there are multiple codons for every amino acid. Then, just to make it even more confusing, some amino acids can be created by as many as 6 different codons, while others can only be created by two.

From a purely design perspective, especially a design perspective motivated by a belief in an omnipotent creator of the universe and all associated laws (they know who they are), this is stupid. It’s like inventing the decimal numeric system and then deciding to have 4 symbols for “7”, or making a binary system with 12 symbols for 0 and 8 symbols for 1. It makes no sense.

But from a biological and molecular standpoint, this redundancy is probably responsible for life’s ability to develop into so many and varied forms. Any codon will create an amino acid: returning to the programming analogies, there are no exception-inducing combinations that will make the code crash or stop responding. The molecular compiler is infinitely robust.

And an infinitely robust compiler might be significantly less efficient, but it’s also far more versatile.

I’ve gone rather significantly off-topic, haven’t I? One moment…

[Drags post back to topic by its neck…]

The genetics in Species differ from real life DNA in that they aren’t stored as a series of characters. This is another case where I had to balance the limitations of a computer simulation with sticking strictly to reality. An accurate simulation of genetic code and amino acid generation, while interesting from an academic standpoint, wasn’t the game I had in mind. In addition, computers have all sorts of problems with string manipulation and memory management. It’s not impossible, but representing a creature’s genetic values as a string would mean far less CPU for other aspects of the game (like simulating and drawing more creatures)

So instead, the internal “genetic code” of creatures in Species is actually a list of numbers.

But that doesn’t mean I don’t have a genetic code in the game. In this case though, and unlike real life, the genetic code is quite literally a “code”: it is generated from the original number list using a fairly simple cryptographic key. It can be used to reconstruct the original numbers and thus clone the creature.

So the genetic code you see in game is not the actual genetic values used to simulate the creatures, but it does represent them. So what are the practical effects of this?

Mainly, they make direct genetic manipulation highly inadvisable for a variety of reasons, which I’m about to go list. But first, I’ll say this: I want direct genetic manipulation in Species. Maybe not immediately, but definitely within a few versions of the alpha. So these problems are all things I need to sort out. Now, on with the show…

1. Crashing.

The most obvious problem is that the compiler is not completely robust. With the right sequence of characters, it can be made to crash: for example, a number with two decimal points, or none. This is something I’m going to have to invest time in. Making the compiler robust enough to compile a gene list from any sequence of characters would be a great advantage for the game, because it would pave the way towards direct genetic manipulation.

2. Sensitivity.

A less-obvious result of this system is that making small, direct changes to the genetic code could have ridiculous effects on creature physiology. For instance, moving the decimal place could instantly make a Godzilla-like creature. Since mutations are normally applied to the numbers, and not the code, this doesn’t happen in the games ecosystem: but editing the code directly? Anything goes.

There are several artificial solutions to this problem. I could restrict what codons can be changed: make it so the player can only move decimal places by small amounts. Or I could give artificial ranges to genetic values, so if they go outside the arbitrary ranges the creature dies when it’s born. A third way would be to make the direct manipulation work on the numbers the code represents, rather than the code itself.

I don’t really like any of these options: they restrict the player’s freedom to type whatever they want into the genetic field. The option also exists to make the compiler itself make changes in the background before generating the creature from them, to give the illusion of freedom but make sure ridiculous numbers don’t get out of hand (indeed, this is what will happen to the aforementioned numbers with two decimal points), but I’m not too fond of that idea either: it feels like cheating because no matter what I do the min-max ranges are going to be arbitrary.

There is a third option, though I’m not sure whether it will help: encode the numbers differently. Currently the cryptogram is simple: assign a different codon to each possible number and symbol and write the code out directly. It’s possible that a different means of encoding (for instance, prefixing the number with it’s exponent?) would be less sensitive to direct manipulation.

I’d love to hear anyone’s idea’s for this by the way: in fact, all of these are problems I haven’t definitively solved yet.

3. Order

The genetic code is just a list of numbers: it’s the order of the numbers that decides which gene they affect. This makes the entire system extremely sensitive to insertion and deletion mutations: a “9” won’t affect much at the very end of a 7-decimal number, but an insertion mutation could easily push it into the front of the next gene’s number with obvious consequences. Even worse, inserting or deleting a single character rather than two would completely change the meaning of the codons after it, and probably result in some sort of missingno equivalent.

Thankfully, this is something I have already taken a few steps to alleviate. I have stop codons in place: an insertion mutation early in the torso segment, although it will (probably dramatically) affect the entirety of torso, will have no effect on any feature after that because there is a stop codon at the end of the torso segment.

I’m considering even taking this a step further and including stop codons between many genes, to further reduce the impact of insertion mutations.

4. Mutation

As I mentioned in point 2, mutations in Species act on the numbers behind the genetic code, not the letters in front of it. This isn’t really a problem, but it can look strange when you’re examining the genetic code. I’ll show you what I mean with a quick diagram:

Parent: Gene 1: 1.0000000 AGTATGTGTGTGTGTGTG
Child : Gene 1: 1.0128639 AGTATGAAAGCAGGACCG

With a tiny change (~0.01) the entire genetic string now looks completely different. This is especially noticeable in game, when you’re looking at the “species average”.

Fixing this is actually a very interesting task, from a mathematical perspective. Currently, the mutation amount of each gene is determined by a simple random number generator, giving a nice even probability distribution. My initial thoughts were to simply round to the highest significant digit, so 0.465… would come out as 0.5, while 0.007324… would come out as 0.007. Mathematically inclined readers may have already worked out the problem with this, but if not why not see if you can work it out before moving on…

Worked it out? What this does is gives a 90% probability that a specific codon/digit will be the one modified, and a 99% probability that it will be one of the first two. You’d likely never see some of the later digits mutated, and that’s not the way mutation works.

My second idea was to take a ‘per codon’ approach to mutation: each mutation randomly picks and affects a single digit of the number.

Parent: Gene 1: 1.0000000 AGTATGTGTGTGTGTGTG
Child : Gene 1: 1.0080000 AGTATGTGCATGTGTGTG

From a genetic perspective, this looks pretty good. Genetic differences between parents and children are much smaller and more logical. But from a numbers standpoint… can you work out the unintended consequences in this case? It’s a bit more complicated than the last one.

Here’s my problem: ~86% of mutations are now going to have an effect of less than 0.1: 71% will have less than 0.01 effect. All I’ve really done is replaced the constant probability curve of the random number generator with an exponential one, so the majority of mutations are now ridiculously small.

This is an interesting situation: in some ways it’s actually a good thing. It means that creatures can have higher mutation tolerances and rates, because the majority of mutations are going to be tiny with the occasional large ones mixed in. On the other hand, it may slow down evolution. Based on what I’ve seen in Species (and what I’ve read in reality), I believe “lucky mutants” have a far lower influence on population change than the slow-but-constant adaptation of the entire population.

I’m currently considering a mix-and-match approach: apply both of these methods and hope that the individual digit changes mask the fact that certain codons are far more susceptible to mutation than others. But I’m also open to suggestions: if you’ve got any other ideas, I’d love to hear them!


“Molecular Genetics doesn’t lend itself particularly well to comedy, does it?”

  1. #1 by BaldySlaphead on March 14, 2012 - 12:05 am


    I’m certainly in no position to give you any answers to your queries, but I wanted to say how much I enjoyed this post. It was genuinely informative both about genetics and what you’re trying to achieve.



  2. #2 by ququasar on March 14, 2012 - 8:55 am

    Thanks! Glad at least one person is interested in my technical/bio-geek posts. 😀

    Actually, I was thinking today about what I wrote about DNA being an inefficient code, and thought, “how would I/a designer do it better?”

    My answer to that is simple: 2 character codons. 4×4 characters gives you 16 combinations. Add 2 prefix codons, (for instance, when the transcription molecule reads AA or AT it immediately reads a third molecule and compiles an amino acid from that) and we’ve got 24 combinations, enough for the 20 amino acids and 2 regulator codons, plus 2 left over. And just like that our DNA now takes up 30% less space to stores the exact same information.

    But a code like that requires the transcription molecule to be a much more complex machine than it is. It can no longer just react naturally to different codons: it needs to read, compile, and build the correct amino acids.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: