Weighted Random Algorithms - CodeProject

:

Introduction

The Random algorithm produces the randomly distributed values. Sometimes, we need the values randomly distributed across several keys with values produced in different quantities for different keys.

Imagine that we have to create a mock service. This mock will imitate the payload for our system, it will generate data and send it to the system.

The real data are provided by source systems in a random but unequal way. Some systems always produce more data than others do. We need to emulate not only the random behavior but also the system-related payload. Hence, the weighted random algorithm.

In our example, we need to choose the values as Sent, Failed, etc.

Using the Code

Let us start with the code:

static readonly var KeyValues = new Dictionary<char, string> 
        { 
            {'1', "Sent"}, 
            {'2', "Received"}, 
            {'3', "Failed"}, 
            {'4', "Rejected"}, 
            {'5', "Suspended"}, 
            {'6', "Filtered"}, 
            {'7', "Processing"}, 
            {'8', "Validated"}, 
            {'9', "Validating"}, 
            {'A', "Canceled"}, 
            {'B', "Completed"} 
        }; 
string ValueDistribution = 
     "122333444455555666666777777788888888999999999AAAAAAAAAABBBBBBBBBBB"; 
string Value 
  { 
      get 
      { 
          return KeyValues[ValueDistribution[Rand.Next(0, ValueDistribution.Length)]]; 
      } 
  } 

As you can see, the algorithm is simple. Each value is linked with a symbol, which works as the key in dictionary. Then we create a string, which defines the distribution of the values or the weight of the values. In this example, the weight of the Sent value is one, because there is only one symbol ‘1’ in the distribution string. The weight of the Canceled value is 10 because there are 10 symbols ‘A’ in the distribution string.

Then we randomly choose one symbol from the distribution string and use this symbol to choose value from dictionary.

The algorithm is pretty simple in understanding and simple in management. All we need to do is provide the keys for values and create a distribution string.

2. Fast

What is wrong with the first simple solution? One thing is it uses the Rand class, which is not fast. If we need performance, we might want dump Rand to perform maybe not so randomly but faster.

string ValueDistributionRandomized = 
     "8B956A78997BB48A627A59B35976A4B9BA76BA48A3285B6B9977863A14AA8B59B"; 
int _curIndex; 
string Value 
{ 
     get 
     { 
         _curIndex = _curIndex %ValueDistributionRandomized.Length; 
         return KeyValues[ValueDistributionRandomized[_curIndex++]]; 
     } 
} 

The KeyValues dictionary is the same. The ValueDistributionRandomized defines not only the weight but also distribution. We have to imitate random distribution by manually mingling the key symbols in the distribution string.

3. Auto

Anyways, let me stop with performance for now and pay more attention to the usability. It might be no big deal in our case but we have to do two unnecessary manual tasks: we have to fill in the dictionary with keys and the distribution string also with keys. Now I will automate these steps, so in case of large lists and complex distribution, we do not spend our precious time on this.

Auto(Dictionary<string, int> ValueWeights) 
{ 
    int i = 0; 
    var sb = new StringBuilder(); 
    foreach (var ValueWeight in ValueWeights) 
    { 
        var curKey = (char) (48 + i++); 
        KeyValues.Add(curKey, ValueWeight.Key); 
        sb.Append(new string(curKey, ValueWeight.Value)); 
    } 
    _ValueDistribution = Randomize(sb.ToString()); 
}

The constructor takes the dictionary, which compounds the values with correspondent weights. For example, {"Received", 300}. The weight unit can be any integer. Constructor fills in the weights in ValueWeights dictionary. Simultaneously, constructor fills in the distribution string with key symbols, the number of symbols equal to weight. In the end, the distribution string is randomized.

The Randomize method is public, so client could mix distribution string from time to time.

Tests

How fast is the Fast?

I have tested it with one million cycles:

Here is result with 10K cycles, repeated 5 times:

10,000 cycles:

Method   Ticks    Ticks/Cycle
==============================
Simple   55,755       5.58
  Fast   49,477       4.95
  Auto   46,925       4.69
Simple   48,498       4.85
  Fast   46,645       4.66
  Auto   45,706       4.57
Simple   49,905       4.99
  Fast   46,317       4.63
  Auto   46,148       4.61
Simple   49,980       5.00
  Fast   45,928       4.59
  Auto   50,125       5.01
Simple   49,035       4.90
  Fast   46,144       4.61
  Auto   45,773       4.58

Conclusion

There is no significant difference between methods, so in the most cases the Auto method is preferable, because it is fast enough and it is the simplest method in use.

You can see the last version on https://github.com/leo-gan/WeightedRandom.

The code can be download as a NuGet package from NuGet.org.

History

  • 2015-06-01, Added Garbage Colledtion for each cycle. Improved report formating.
  • 31st March, 2015: Initial version