Text Analysis with AntConc for social media data: Clusters and N-Grams

In the last post, we learn how to explore and analyze concordances in the software AntConc. Now we are going to show you how to use AntConc to study clusters, n-grams and also how to count hashtags frequency and locate them.

This post is part of a series of tutorials:

  1. Intro, Opening a File and Settings
  2. Word Lists and File Viewer 
  3. Concordancer and Concordance Plot 
  4. Clusters and N-Grams (we are here)
  5. Collocations (soon)
  6. Keyword Lists (soon)


Extracting N-Grams from the corpora

N-Grams are a contiguous sequence of items from a text(s). They can be used as a way to analyze phonemes, letters, syllables or words, for example. To our goals, we are going to see the required steps to generate n-grams of words in a given text.

1. Firstly, as usual, you need to open a file and generate a Word List. This time, we are going to use a corpus of  thousand tweets containing the term ‘plastic’. Download a corpus file with 19k tweets.

2. It’s important that you don’t forget to open the customized settings, because we’ll work with #hashtags and @usernames.

3. To use N-Grams, you need to go to the tab Clusters/N-Grams and check the option highlighted on the image:


4. Click and Start and here it is the result


5. You can change the minimum and maximum values of the N-Gram Size. For example, if you change them to 1 and 3, the result changes and now you can see 1-gram and 3-gram sequences of words:


6. To analyze specific clusters of words, you can search for them using the Search Term box.


7. Mind the options like Search Term Position. You can search for words before or after your terms. Searching for clusters with size 2 with the term plastic On Right, we get the following result.


8. Important: the column Range counts the number of Corpus Files that word or term is present. This number is specially interesting when you are comparing corpora.


Counting Hashtags

You can use the Clusters/N-Grams tab to count terms that have some structure, character or word in common. For example: #hashtags and @usernames in some social media platforms.

1. Open your file(s). Don’t forget to configure the settings as explained in the tutorial on files and settings.

2. Generate a Word List.

3. To generate a list of hashtags, for example, you should go in Clusters/N-Grams tab and use the Search Term   ‘ #* ‘ , using 1-1 values in the Cluster Size options.


Troubleshooting: counting hashtags didn’t worked? Probably, AntConc is configured with the default settings, where the ‘#’ symbol is a wildcard. Check!