C# in Depth

Random numbers

When you see the word "random" in a question title on Stack Overflow you can almost guarantee it will be the same fundamental problem as countless similar questions. This article takes a look at why randomness causes so many problems, and how to address them.

The problem

The Stack Overflow (or newsgroup, or mailing list etc) question usually goes something like this:

I'm using Random.Next to generate random numbers, but it keeps giving me the same number. It changes from run to run, but each time it will generate the same number lots of times.

This is caused by code like this:

// Bad code! Do not use!
for (int i = 0; i < 100; i++)
{
    Console.WriteLine(GenerateDigit());
}
...
static int GenerateDigit()
{
    Random rng = new Random();
    // Assume there'd be more logic here really
    return rng.Next(10);
}

So, what's going wrong here?

The explanation

The Random class is not a true random number generator. It's a pseudo-random number generator. Any instance of Random has a certain amount of state, and when you call Next (or NextDouble or NextBytes) it will use that state to return you some data which appears to be random, mutating its internal state accordingly so that on the next call you will get another apparently-random number.

All of this is deterministic. If you start off an instance of Random with the same initial state (which can be provided via a seed) and make the same sequence of method calls on it, you'll get the same results.

So what was wrong in our example code? We were using a new instance of Random on each iteration of the loop. The parameterless constructor for Random takes the current date and time as the seed - and you can generally execute a fair amount of code before the internal timer works out that the current date and time has changed. Therefore we're using the same seed repeatedly - and getting the same results repeatedly.

What can we do about it?

There are lots of solutions to this problem - some of which are better than others. Let's get one of them out of the way first, as it's different to all the rest.

Use a cryptographic random number generator

.NET has a RandomNumberGenerator class which is the abstract class from which all cryptographic random number generators should be derived. The framework itself ships with one such derived class: RNGCryptoServiceProvider. The idea of a cryptographic random number generator is that even if it may still be a pseudo-random generator, it works very hard to be unpredictable. The built-in implementation takes several sources of entropy which effectively represent "noise" in your computer, and would be very hard to predict. It may use this noise not just to compute a seed, but also when generating the next numbers, so that even if you know the current state, that may not be enough to predict the next results (or the ones already generated). This depends on the exact implementation though. Windows can also make use of specialized hardware sources of randomness (such as a piece of hardware which watches a radioactive isotope for decay) to make the random number generator even more secure.

Compare this with Random. If you see (say) 10 results of calling Random.Next(100) and have a fair amount of computing resource to put into the task, you may well be able to work out the initial seed and predict what the next result would be... as well as what earlier results were, potentially. That's a disastrous state of affairs if the random numbers are being used for security or financial purposes. Cryptographic random number generators are generally slower than Random, but do a much better job of giving numbers which are hard to predict and independent.

In many cases performance of the random number generator isn't an issue - but having a decent API is. RandomNumberGenerator is basically designed to generate random bytes - and that's all. Compare this with the API of Random, which lets you ask for a random integer, or a random double, or a random set of bytes. I usually find that I need an integer in a range - and getting that reliably and uniformly from a random array of bytes is non-trivial. It's far from impossible, but at the very least you'd probably want an adapter class on top of RandomNumberGenerator. In many cases, the pseudo-randomness of Random is acceptable - if you can avoid the pitfalls described earlier. Let's look at how we can do that.

Use one instance of Random repeatedly

The core of the fix for the "lots of repeated numbers" is to use the same instance of Random repeatedly. This sounds pretty simple... for example, we can change our original code like this:

// Somewhat better code...
Random rng = new Random();
for (int i = 0; i < 100; i++)
{
    Console.WriteLine(GenerateDigit(rng));
}
...
static int GenerateDigit(Random rng)
{
    // Assume there'd be more logic here really
    return rng.Next(10);
}

Now our loop will print different digits... but we're not done yet. What happens if you call this code multiple times in quick succession? We might still create two instances of Random with the same seed... although each string of numbers would contain different digits, we could easily get the same string of digits twice.

There are two ways of avoiding this problem. One option is to use a static field to maintain a single instance of Random which everything uses. Alternatively, we could push the instantiation higher, of course - eventually reaching the top of the program, which only ever instantiates a single element of Random, and passes it down everywhere. That's a nice idea (and it expresses the dependency nicely), but it won't quite work... at least, it could cause problems if your code uses multiple threads.

Thread safety

Random isn't thread-safe. It's a real pain, given how we'd ideally like to use a single instance everywhere in our program. But no, if you use the same instance at the same time from multiple threads, it's quite possible to end up with a state of all zeroes internally, at which point the instance becomes useless.

The best approach in my view is to still use one instance, but also use a lock which every caller has to remember to acquire while they're using the random number generator. That can be simplified by using a wrapper which does the locking for you, but in a heavily multithreaded system you'll still potentially waste a lot of time waiting for locks.

I used to recommend an approach where each thread had its own random number generator, using thread-local variables. Unfortunately, with System.Random at least, this leads to "patterns" in the generated numbers - and you really don't want to see patterns. The simplest approach was to keep a master seed (generated based on time once) and then increment that seed each time a new random number generator was needed. A slightly better approach used a single master random number generator to generate seeds - but neither of these ends up being terribly effective. I suspect that with a better random number generator, the latter approach would be okay, but if you're using System.Random it's better to just take the hit of locking. See this Stack Overflow question for a demonstration of the problems involved.

Problems with interface design

One problem remains: this still isn't very secure. As I mentioned earlier, there's a more secure version in RandomNumberGenerator, with the most commonly used derived class being RNGCryptoServiceProvider. However, the API to that really is hard to use in common cases.

It would be very pleasant indeed if the framework providers had separated the "source of randomness" concept from the "I want to get a random value in a simple way" concept. Then we could have used a simple API backed by either a secure or insecure source of randomness, depending on our requirements. Unfortunately, 'twas not to be. Maybe in a future iteration... or maybe a third party will come up with an adapter instead. (It's beyond me, unfortunately... doing this kind of thing properly is remarkably difficult.) You can nearly get away with deriving from Random and overriding Sample and NextBytes... but it's not clear exactly how they need to work, and even Sample can be tricky. Maybe next time...