How to Make Wordcloud on Python using Text File

H

Wordclouds can be a great representation of the thought process of an individual. This guide will use Python to create a wordcloud from a text corpus. Wordcloud is used to identify the most commonly used words in the text. In this tutorial, I will use a text corpus of Angela Merkel’s speech from the past four years to make wordcloud on Python.

A wordcloud lets you evaluate the text by pointing out the keywords. In turn, it gives a good idea of what’s inside the text. The more the frequency of a specific word. The more it appears.

In this guide, I am using Python to create an image file of wordcloud. The development is done on Pycharm. You can also find the code linked to this post.

Following are the libraries that we are going to use:


import matplotlib.pyplot as plt
from wordcloud import WordCloud 
import stop_words as sw
import numpy as np
from PIL import Image

If you don’t have these libraries installed. You can install them by the following commands:

pip install matplotlib

Just like the install the other libraries as well.

  • matplotlib: It is a famous library that is used to plot graphs and handle data.
  • wordcloud: as the name states, we are going to use it for the creation of wordcloud.
  • Stopwords: Although there is a library to remove stopwords in the wordcloud library itself. Yet, I found this library a lot more efficient.
  • NumPy: For masking the text into the shade of an image, we are using NumPy.
  • PIL from Image: we are also using it for a fancy wordcloud, which will have a background image and different aesthetics to make it look better.

Now let’s move on to the guide on how to make wordcloud on Python using text:

First, let’s talk about the primary function where we are processing the text. I have scrapped the texts of Angela Merkel’s speeches from the last four years and stored them in a text (.txt) file.

Stop words are the prepositions and adjectives that do not add to the text. So we remove them. stop_words library has Stop words from many languages from around the world. In this guide, we will first remove the stop words of German words from our text file.

So we have opened the file, removed the stop words and sent it as an argument to the wordcloud function.


if __name__ == '__main__':
    stop_words = sw.get_stop_words('german')
    print(stop_words)
    file = open("AngelaMerkel.txt")
    line =file.read()
    words = line.split()
    for r in words:
        if not r in stop_words:
            appendFile = open('filteredtext.txt', 'a')
            appendFile.write(" " + r)
            appendFile.close()
    with open('filteredtext.txt', 'r') as txt_file:
        filteredtext = txt_file.read()
    wordcloud(filteredtext,stop_words)

Now, we have opened the files and removed the stop words. It’s time to map the wordcloud graph. Now initially, we are setting the figure size and dimensions.

Mask is the shadow of an icon or image that we are using for the text to shape it. For example, in the picture below, you can see the shape of wordcloud as the masked image.

 


def wordcloud(text,stopwordss):
    # Set figure size
    my_mask = np.array(Image.open('angel.png'))
    wordcloud = WordCloud(width = 3000, height = 2000, random_state=1, background_color='black',
                        colormap='rainbow', collocations=False, 
                             stopwords=stopwordss,mask=my_mask).generate(text)
    # Display image
    plt.figure(figsize=(40, 30))
    plt.axis("off")
    wordcloud.to_file("wordcloud1.png")

Using the arguments in the WordCloud() function. You can change the wordcloud accordingly. You can change the background color as well as colourmap. You can find the reference to the colourmap on the matplotlib website. Alongside this, we are also using the WordCloud function’s stopwords to improve the results further. Once the parameters are set. We are using generate() and passing the contents of the text corpus as text.

Make Wordcloud on Python Text file

If you want to make a wordcloud yourself, you can fork the Sourcecode from here on Github.

That is all from my side. Making a wordcloud is pretty straightforward. The complicated part is to scrap the data. You can find the source code on my GitHub repository as well. Suppose you have questions about the code above or Python. Feel free to reach out to us if you have made a cloud map using this guide and would like to share it with us. We would love to see your work.

About the author

Add comment