collocations | Tarcízio Silva

Collocations refers to how words occur regularly together in the texts/corpora. Searching for colocates related to a specific term could point to other words and expressions important in the documents.

This post is part of a series of tutorials:

Exploring Collocation

1. As usual, open a file or set of files (don’t forget to configure the settings). In this tutorial, we are going to use the file plastic_19k_tweets_june_2018.txt available in our datasets folder.

2. Generate a Word List.

3. Go to the tab Collocates and search for a term like ‘plastic’. The following list ranks the more relevant collocates:

As you can see, most of the collocates are related to “plastic surgery”, not the material plastic.

4. A frequent problem is the listing of words which appears only one or few times in the file(s). So you can increase the Minimum Collocate Frequency:

5. The words will be searched in a Window Span to count the co-occurrences in the vicinity of the search term. You can increase or decrease the span on the left and on the right of the search term.

6. Since Twitter texts are very short, we recommend decrease the span. The results might be more precise, as in the following example:

With these results, you can explore the collocates to try to understand and locate meaningul words related to your keywords of interest.

Tarcízio Silva

Pesquisa, ciência, tecnologia e sociedade, racismo algorítmico

Arquivo da tag: collocations