Yeah, it’s time for another technical post. Sorry guys, I just can’t help myself. If you understand what I’m talking about, you’re probably doing better than I.
CPU Performance in Species is probably the single most restrictive factor of programming the game. It limits the number of creatures, it limits the methods I can use to change their variety, it limits the complexity of their behaviour, it limits everything sooner or later. But with very, very careful programming and optimisation, that ‘later’ can be pushed pretty far back.
Firstly, a little background: Species is ‘CPU Bound’ as opposed to ‘GPU Bound’. A CPU (Central Processing Unit) limited program spends more time computing what it has to draw than it does actually drawing it, meaning the GPU (Graphics Processing Unit) sits idle until the CPU is finished. I could dump a whole load of extra information onto the GPU and it simply wouldn’t care, because it’s hardly working as it is. The GPU is awesome like that.
Shawn Hargreaves can probably explain all this better than I, if you’re looking to read up on it:
I’m loathe to admit it, but I secretly suspect the Terrain LOD system I spent so long programming may be slowing the game down, because it takes work away from the GPU at the cost of a little bit of work on the CPU’s part, and I’m so heavily CPU bound there’s a chance just brute-force-rendering the terrain would be better at this point (I could be wrong, though, and taking it out at this point would be rather silly since there’s so much more yet to implement, on the GPU and CPU)
That’s why some of my systems, like the Billboard Vegetation, are reliant on offloading information from the CPU to the GPU. The GPU can handle it. The GPU can handle anything. (Advertised product may not in fact be able to handle anything)
So that’s fairly simple, right? Drawing takes no time at all, so all our time must be spent updating the simulation.
In reality, I’m only spending maybe a quarter of my CPU time in the update call. The reason for this is the bottleneck between the CPU and the GPU. Before the GPU can draw anything, the CPU need to tell it what to draw, in excruciating detail.
So even though I can’t save time by saving the GPU work, I can save time, lots of it, by stopping the CPU from telling the GPU it has work to do. There are a variety of ways I’m doing this…
I’m sure I’ve mentioned before that creatures are made up of a whole bunch of separate ‘part’ meshes. A high detail creature would have a torso, tail, 6 limbs, 6 limb tips, a neck, a head, and a currently unlimited number of Features attached to the head (open to later revision). Multiply this by whatever our creature cap is and we’re calling the draw routine far too many times for good performance.
But it doesn’t have to be called for every creature. Creatures off-screen don’t need to be drawn at all, and we can get away with drawing less detail on distant creatures.
We can also speed this up with State Batching. (Note for technical readers: Model Instancing would have been my preferred approach, but the parameters for each creature are unique. It’s not easy to instance meshes under those conditions). See, in order to tell the GPU what to draw, the CPU has to feed it with the model, texture, and parameters things like animated bone positions, sizes, colours, etc). The easiest method would be just to feed in these three things for every model, every time, and this is exactly what repeatedly calling Draw() in XNA does.
But you don’t actually have to. The GPU remembers: if you just change the parameters, but leave the model and texture as is, you can draw a second copy of it without resending the model. So by carefully rearranging the order of drawing to (for example) draw all creature torso’s one after the other, you only need to set the torso model on the GPU once. You can do the same thing with textures and parameters too, if you’re changing models all over the place but repeatedly using the same texture.
Obviously there are other, more subtle and game-specific ways to improve general CPU performance as well. Moving calculations out of loops wherever possible is a generic one: more than once I’ve caught myself doing a calculation multiple times when it could be done only once. A more specific example was carefully controlling matrix operators: streamlining the modifications I’d made to the skinning animation system cost me some programming time, but resulted in a noticeable performance increase.
But even with all these ongoing optimisations, I’m still not happy with the games performance. I’ve been treating my 4 year old, dual-core laptop as a baseline minimum (not least because that laptop is my development machine), but it struggles keeping the game running at a decent frame rate once the simulation gets going properly. There’s still room to optimise (there’s always room to optimise), but I don’t think I can get it to
the performance levels I’m hoping for with after-the-fact methods and reprogramming individual methods. This leaves one other option…
… multithreading. *dramatic thunder*
(Note: this rest of this post will likely be one long whine. I hate multithreading even more than I hate render targets. I still plan to tame it one day, because it really is a powerful way to improve CPU performance, but not for the alpha release)
Most machines have more than one CPU these days, and many people assume as a result that that means the computer must be faster. 4 cores? FAST. 8 cores? ULTRAFAST. 128 cores? OMG WHY YOU STEAL NASA SUPERCOMPUTER. In reality though, most programs are not designed to take advantage of this. The vast majority of programs do all their actions on a single core, because Multithreading Is Hard.
Multithreading is the process of taking part of what you have to calculate and giving it to a separate process (thread), which can run in parallel on a separate core: in short, it’s telling the computer to do multiple things at once. So for a game, you might tell the main core to render the screen while the second core works on updating the simulation for the next frame.
Of course, describing the process so clinically doesn’t quite summarize the emotional response I have to multithreading. So for that side of things, here’s an alternative explanation: multithreading is the process of juggling 3 plates, 2 watermelons, a chainsaw and a rabid wolverine while walking a tightrope over a pit of giant acid-spitting ants while cyborg trapeze artists take pot-shots at you with AK-47’s and grenade launchers.
I’ve actually attempted multithreading in the past. The results were unpleasant, unpredictable, and impossible to debug, mainly because I didn’t know what I was doing but also partly because that’s just the way multithreading is.
To give you an idea, suppose the first thread retrieves something from the second before the second thread is finished with it. The program might fail (which is what we want, so we can work out why the first thread is retrieving stuff from the second), but it will usually try to keep going. The simulation will start acting strangely, using false values, but not actually telling you it’s doing so. If it fails, it will fail somewhere else in the code that has nothing to do with the multithreading, and you will then spend hours investigating in the wrong place.
The usual approach to multithreading is to implement a lock, so that one thread waits for the other to finish before using any of the same resources. But the draw call needs to use a lot of the same resources that the update call uses, so it would likely spend much of it’s time waiting, and wouldn’t be much of an improvement over single threading.
(There are, obviously, ways around this, and when I do implement multithreading properly I will have to study them in detail)
In addition, for some reason, Multithreading plays havoc with Visual Studio’s debug mode: a multithreaded game seems to run much slower than an equivalent single threaded game in debug mode: so slow that, during my ill-fated attempt, I had run the game outside of debug mode on in order to test it. Less than ideal, and it makes tracing bugs and errors much, much harder.
For the alpha release, and probably for the second alpha release too, we will have to settle for less-than-optimal performance. When I do implement Multithreading I will have to set aside a lot of time to rework a large amount of the code, and to deal with the inevitable problems. With luck, though, I will be able to significantly improve frame rates, which will allow for bigger worlds and more creatures.
“You know, there’s probably potential for a misogynistic nerd-joke about multithreading in there somewhere.”