The Task
It’s pretty easy to get a random letter using TypeScript. Using lodash and returning upper case letters, it would be
import _ from "lodash"; const asciiCodeA: number = 97; const lettersInAlphabet: number = 26; const randomLetter: string = String.fromCharCode(_.random(lettersInAlphabet - 1) + asciiCodeA);
You don’t even need to use lodash – it’s a convenience.
The problem with this code is that it returns all the letters with the same frequency – so A is returned as often as X. That’s not how the English language works – X crops up way less than A. I want to be able to return random letters based how often they occur in the English language.
Letter Frequencies
To do this I’m going to use the letter frequencies from Wikipedia
https://en.wikipedia.org/wiki/Letter_frequency
The frequencies differ depending on whether you are sampling text or taking all the words in the dictionary – for instance there are quite a few words beginning with X in the dictionary, but people just don’t use words like xenopus or xylem in everyday speech, so the frequencies are different. I’m implementing both.
Also, there is a difference in frequency depending on whether you are looking at the frequency when the letter occurs anywhere or when it occurs at the beginning of a word. I’ve implemented the random generation based on the frequencies at the start of the word as that is my use case, but it would be easy to change that to use other frequencies.
The frequencies are expressed as a percentage and should all add up to 100. They don’t unfortunately due to rounding so I’ll also have to account for that.
Implementation
The main logic is in this class
class LetterFrequency { private _letters: Array<Letter>; private _randomNumberGenerator: IRandomNumberGenerator; constructor( public Type: FrequencyType, randomNumberGenerator: IRandomNumberGenerator | undefined = undefined ) { this._randomNumberGenerator = randomNumberGenerator || new RandomPercentageGenerator(); this._letters = [ new Letter("A", 11.7, 5.7), new Letter("B", 4.4, 6), new Letter("C", 5.2, 9.4), new Letter("D", 3.2, 6.1), new Letter("E", 2.8, 3.9), new Letter("F", 4, 4.1), new Letter("G", 1.6, 3.3), new Letter("H", 4.2, 3.7), new Letter("I", 7.3, 3.9), new Letter("J", 0.51, 1.1), new Letter("K", 0.86, 1), new Letter("L", 2.4, 3.1), new Letter("M", 3.8, 5.6), new Letter("N", 2.3, 2.2), new Letter("O", 7.6, 2.5), new Letter("P", 4.3, 7.7), new Letter("Q", 0.22, 0.49), new Letter("R", 2.8, 6), new Letter("S", 6.7, 11), new Letter("T", 16, 5), new Letter("U", 1.2, 2.9), new Letter("V", 0.82, 1.5), new Letter("W", 5.5, 2.7), new Letter("X", 0.045, 0.05), new Letter("Y", 0.76, 0.36), new Letter("Z", 0.045, 0.24), ]; let cumulativeFrequency = 100; this._letters.slice(0) .reverse() .map((e) => { e.CumulativeFrequency = cumulativeFrequency; const frequency = Type == FrequencyType.Dictionary ? e.DictionaryFrequency : e.TextFrequency; cumulativeFrequency = Math.round( (cumulativeFrequency - frequency + Number.EPSILON) * 1000 ) / 1000; }); } random = (): string => { const rnd: number = this._randomNumberGenerator.Random(); return this._letters.slice(0).reverse().reduce((acc: string, current: Letter) => { if (rnd <= current.CumulativeFrequency) { acc = current.Letter; } return acc; }, "A"); }; }
If we want to get the frequencies based on the occurrence in a dictionary, then it is called like this
const letterFrequency = new LetterFrequency(FrequencyType.Dictionary); const randomLetter: string = letterFrequency.random();
for the frequencies in a text the call is very similar
const letterFrequency = new LetterFrequency(FrequencyType.Text); const randomLetter: string = letterFrequency.random();
The basic operation of the code is
- Build an array of frequencies for dictionary and text analysis
- Calculate the cumulative frequencies
- Generate a random number with 3 decimal places between 0 and 100
- Search through the cumulative frequencies for the letter which occurs at that frequency and return it
There is a couple of things to note ….
Calculating cumulative frequencies
The cumulative frequencies are calculated by stepping backwards through the letters – starting with Z and going back to A i.e.
this._letters.slice(0) .reverse().map(e => { //cumulative frequency logic });
This is because the frequencies actually add up to more than 100 (due to rounding) so if I started from A and added them then the last frequency would be more than 100. If I do it backwards then the last frequency will always be 100, however the first frequency will be a bit smaller than it should be, but I can live with that.
Dependency injection
The constructor of the class has what looks to be an unnecessary randomNumberGenerator parameter
private _randomNumberGenerator: IRandomNumberGenerator; constructor( public Type: FrequencyType, randomNumberGenerator: IRandomNumberGenerator | undefined = undefined ) { this._randomNumberGenerator = randomNumberGenerator || new RandomPercentageGenerator(); //.. more code }
I’ve got the random number generator in the constructor as an optional parameter to implement a form of dependency injection. The random number generator can then be overridden in tests with a generator that isn’t random at all and generates a static number. This facilitates testing against reliable (non-random) output.
I’m aware there are other ways to achieve this for instance using jest.mock to override the imported module, but it’s what I was trying out and it works so it stays in. It can be easily taken out if you hate it.
Full Code
As ever the full code is on my GitHub site at
https://github.com/timbrownls20/Demo/blob/master/React/spellings/src/services/LetterFrequency.ts
This includes all the interfaces, enums and class definitions. I have included tests with this mini-project which is at
Use Case
This is part of a bigger project written with Typescript and React at
https://github.com/timbrownls20/Demo/tree/master/React/spellings
I’m using the DataMuse Api to bring back random words for use in a simple spelling game. The data muse API only returns batches of 1000 results at most so to get a decent list of random words I’m calling the API with a random first letter and then picking a random result from the 1000 words it returns (it’s more complicated than that but that is the essence of it).
If I get a random letter by using the frequency unaware code
const randomLetter: string = String.fromCharCode(_.random(lettersInAlphabet - 1) + asciiCodeA);
Then x, y or z come up in most lists – for a list of 10 words then x, y or z will appear about 70% of the time. Since not many words start with those letters I see xylophone, years or zoology all the time in my randomly generated list of words. This really isn’t how the English language works, looks weirds and is rubbish for a spelling game.
Useful Links
https://www.datamuse.com/api/. This is the DataMuse API. It’s really interesting and I can see all kinds of possibilities to use it to code up some fun word games. I reckon I could code Scrabble with it. It did crash the other day though so it’s not perfect.
https://jestjs.io/docs/mock-functions gives examples of mock with Jest. The Mocking modules section is another way to mock out the random number generator. It’s probably the better way
https://blog.testdouble.com/posts/2021-03-19-react-context-for-dependency-injection-not-state/ interesting article about dependency injection in React. Inspired me to try it out with my code.