An RNN That Generates Text

Timucin Erbas
9 min readApr 1, 2021

There are so many things that you can do with an RNN. The project that I was up to for the past few weeks is just one of the dope things that you can create.

I made a text generation RNN.

Activations in a Neural Network

Before we get into RNN’s, there is some specific vocabulary that you will have to know in order to understand what I am talking about for the rest of this article. One of these words is “activation”.

Basically, when we say the “activation” of a neural network, we mean the output. Every neural network has its output nodes, and the results that these nodes return are called the neural network’s activations.

Feeding

Feeding means inputting something into a neural network. This is a singular number or a series of numbers. When programming AI we usually represent a series of numbers as an “Array”, or a list.

Return

Return is the exact opposite of feed, kind of similar to activation. To return means to spit out, or give a value after processing the data through a neural network.

To connect all of these terms together, you can think of it this way: A neural network returns it’s activation after processing what it had been fed.

Recurrent Neural Networks

Thats all that you had to know from the fancy grammar point of view of the article. Now on to the real stuff.

Take a neural network. Now imagine you feed it some text, and it returns a number based off of that text. Pretty easy huh?

Now imagine this: Instead of just returning that number, you also feed the first output into another neural network, with some other text. That other neural network returns a different number.

You see what’s happening here? The second output (y) is being influenced by 2 things:

  • x, the first output
  • “def”, the second piece of text that we fed into this multi-network structure.

Let’s repeat this one more time.

It might look complicated, but it isn't. We are just repeating the same process over and over again. Now let’s look at what’s being influenced by what.

  • x: x is being influenced by nothing other than “abc”. The only thing that the network takes in is “abc”, making that the only determining factor on what the value x will be.
  • y: y is being influenced by 2 things. First thing, “def”. Just like x, we fed “def” into the neural network which makes it a factor on what y will be. However, we also input x into this neural network which derives y. All inputs affect the output, making x another determining factor for y. Now let’s remember… What determined x? The string “abc”. Exactly. y was determined by “def” and x, while x was derived using “abc”. Using transitivity (chain rule, as some like to call it) y was influenced indirectly by “abc” and directly by “def”.
  • z: You get the point. z was influenced by “ghi” and y. y was influenced by “def” and x, x was influenced by “abc”. So z was influenced by “abc”, “def” and “ghi”. Pretty cool right?

You can see now why RNN’s work so well with sequenced data. Taking the example above — words are influenced by other words that come before it. With this structure of a neural network, we pay attention to that fact, making our outputs more accurate.

Another thing to admire about this structure is that the words that were further down the road have less of a say in what the output is going to be. When you think of it, this is relatable. If I told you to predict the next word in a sentence at page 235 in a book, you would pay zero attention to the words in the last 234 pages as they are so far back that they don’t influence what the next word will be. However, you will pay a whole lot of attention to the most recent few words, as they are the biggest factor on what the next word will be. This RNN structure also simulates this really well. Activations in the very beginning of the RNN (x for example) very indirectly influence activations later in the RNN (like z)

Restricting Factors

If you understood all of what I just said, then you are set. You get the big picture.

I intentionally left out a few facts in the last part just to show you a clean picture of what’s going on, ignoring the small things. Now it’s time to dive deeper into the things that we missed.

Same Network

Remember how I said “multi-network structure”? That isn’t true. All of the neural networks are the same thing. It’s just easier to understand if you pretend that it is unwind in the first place. If you have looked into RNN’s before this article, I guarantee you that you have seen something like this:

It’s the same thing as the image that I have shown you many times, just… more compact.

All of the neural networks (red) are the same neural network.

Decoy Activation

The first neural network that outputs x has no activation.

Which it does… I just didn’t show it to keep things simple. The very first activation is called a “decoy activation”, or an “initial activation”. Most commonly, this activation is a plain and simple zero, just to hold that spot of an activation. I mean, after all, every single one of those neural networks in each “cycle” is the same. And as you probably know, you can’t change the dimensions of a neural network. What I’m getting at here is that we can’t just remove an input node from the neural network, so temporarily we just put a place holder.

RNN’s In Text

Now that you understand how RNN’s work, it is time to talk about how we can use RNN’s when going through text.

What we do is we split up our sentence into individual letters. We turn the letters into numbers using ASCII (“a” = 65, “b” = 66, “c” = 67 etc.) Once we create the list of numbers, we input those numbers into the RNN. We run the RNN, and it spits out a few different numbers.

We turn these numbers back into alphabetical characters, but there is one catch. We ignore the first few outputted letters since they were influenced by only a small amount of letters that we gave as input, making those outputs inaccurate.

And by this way we are able to predict the next however many letters in a sentence in an RNN. This is exactly how the neural network I created works.

Explaining code

Here is how all of the code works for this Neural Network that I trained. I also have the code on GitHub if you would like to check it out. Feel free!

First, we want to import some libraries that we will use in the future code.

Next, we want to import the dataset that we will use to train our neural network. This training set is from AWS, specifically a collection of writing from a German write, Nietzsche.

We transfer all of the text in the database into a variable called “text” from an already existing text file “nietzsche.txt” from inside of the database.

After importing the data, we want to create a dictionary that turns numbers into text, and text into numbers. We create 2 dictionaries since in python dictionaries are one way. It sort of goes like this:

  • char_indices = {a = “0”, b = “1” … }
  • indices_char = {0 = “a”, 1 = “b”, …}

Now we can fluently turn text into numbers with char_indices, and numbers into text with indices_char

Even though we previously put all of our data in a text file, we still want to process all of the text into input/output for the training data. In this for loop, we go through all of the text, selecting 40 letters at a time, and shifting the 40-letter window by 3 characters at a time. We put that into the array “sentences”. We grab the letter to the immediate right of the the 40-letter window, and put it into the “next_chars” array since it holds outputs.

We do this many times, except we shift the green and red box 3 letters to the right each time since we need many data points.

But we are still not quite done with creating the input. Remember, neural networks take text as an input, and return text. For this reason, we have to turn our training inputs and training outputs (sentences and next_chars) into arrays of numbers. To turn the text into numbers, we use the dictionaries that we previously created, char_indices and indices_char.

Now that we finally have the training data ready, we can create the RNN itself. It might be confusing to visualize the diagram in-code, but here is what it looks like:

Right after we compile the model with a loss function of “categorical crossentropy” which tends to work well for RNNs, and we also add in an optimizer since… well…who doesn’t want their training to work faster? We set the learning rate to 1% since we don’t want the neural network to change its mind drastically every time it sees a datapoint.

Just one more thing before we start training. We create a function called “sample” which grabs a random 40-letter segment from the text so that we can see what our neural network has learned after it trains.

We start the training process.

After the neural network is good to go, we create a nested for loop to take its individual outputs form each “cycle” and stick them together so that we can see what it came up with for a random text that we inputted to the now-trained neural network.

We save this processed output into a variable called “generated”.

We write down generated into a text file so that it can be displayed nicely. Now let’s check out the results!

Results

After training my neural network, I was able to run it on a few examples of text. Here’s what it came up with:

Lines 1 and 2 are the input, the rest is RNN generated
Line 1 is the input, the rest is RNN generated
Line 1 is the input, the rest is RNN generated.

As you can see, it is far from perfect. However, it is much, much, better than just random letters. I mean — when you look at it, you can see that the RNN understands the concept of putting a space in between words pretty darn well.

It also seems to be doing well in punctuation marks by placing commas, periods and quotation marks at places that look like they make sense.

Heck, even the word structures make sense. By the first look, 99% of it is gibberish, however you can see the somewhat alternating turns of vowels and consonances. There are no words like “aeiouauia” or “gjkghypsd”, where it is purely vowels/consonances.

I see that you have come all the way here! Thanks so much for reading this article, and I hope that you have learned at least a few things from this piece of content that you chose to click on about 10 minutes ago.

This project took a lot of effort, and it would be absolutely awesome if you could give it some of them 👏’s!

If you enjoyed this piece, I recommend that you take a look at my other profiles on Twitter and LinkedIn.

To leave you off, here is some wisdom from our AI friend:

Wow. Deep stuff.

--

--

Timucin Erbas

Leveraging AI and Space Technology to shape the future