Twitter Network Analysis and Visualisation with Netlytic and Gephi 0.9.1

Data visualisation has always had importance in social sciences. Visualising data has several advantages including but not limited to making sense of complex issues in an easier and appealing way. This post will serve as a tutorial for researchers and enthusiasts around the world who are interested in making sense of Twitter data and visualising social networks in general.

For this step-by-step tutorial, we will use Netlytic and Gephi.

Netlytic is a free platform to download information on the web and make basic analyses on the obtained data. It works on several popular social network platforms like Twitter, Facebook, YouTube and Instagram. It also supports text files and spreadsheets to analyse data. Data obtained via Netlytic can be used in most other network analysis platforms.

Gephi is an application that works on Windows, Linux and Mac OS used in statistical analyses and visualisations of networks. It also has many different tools to import/export data in various formats usable in other platforms.

Step 1: Register for Netlytic, Download and Install Gephi

The descriptions have links to allow you to register for Netlytic and Download Gephi.

Registration for Netlytic is free and simple. Netlytic has three ‘tiers’.

netlytic_tiers
Right after you register, your account is a Tier 1 account. You will be able to apply for a Tier 2 account for free on your account settings page. This allows you to retrieve 10,000 records per data set, providing you with 50,000 records in total.

There are various limitations for collecting data on different platforms. Netlytic is very informative about these limitations.

netlytic_accounts

After you register your account, apply for a Tier 2 account and obtain it, you are good to go. Having a Tier 2 account is important. Most trending topics on Twitter start with 50,000 tweets and top trending topics usually have a few hundred thousands of tweets. A data set consisting of 10,000–50,000 tweets on a hot topic is statistically ‘representative enough’ (at least for network analysis). You might be able to achieve most of the things you are planning to achieve with 2,500 tweets, but 10,000 is better (therefore more representative) than 2,500, isn’t it?

Downloading Gephi is simple for Windows and Mac OS users. Linux users have to use the terminal and have a basic understanding of installing applications in tar.gz format. Let’s not forget that you must have Java installed on your system. The latest version of Gephi is the version 0.9.1 at the time of writing this tutorial. Gephi has been around for years, but it is still in Beta. While it works like a charm most of the time, it still has issues for some (mostly older) systems and builds. So, don’t forget to save your projects regularly so you can reach them later if a problem occurs.

Important: After you are done, go to ‘tools > plugins’ and download all the available plugins. Check for updates and update your application if necessary.

Step 2: Configure Gephi Memory Settings

In Windows, you will find the gephi.conf file under the ‘etc’ folder in Gephi-0.9.1 under Program Files. You can edit this file with Notepad or any other text processing application. Here is what you should see:

gephi_settings
Please look at the parts Xms256m and Xmx1024m in the text file. This means Gephi is allowed to use a minimum of 256 megabytes and a maximum of 1024 megabytes of RAM. If you set the memory too low, Gephi will crash and you will lose your work. If you set it higher than the availably system memory, Java will crash and you will lose your work. Conventionally, setting minimum memory to 256m and maximum memory to half of the available system memory is popular (like 2GB for Gephi if you have 4GB of RAM).

Sometimes, even if your system has enough memory to run Gephi, it just won’t start. For example, my system has 12 gigs of RAM, but Gephi won’t start if I set the maximum memory to 2 gigs. It’s just a problem with my specific configuration. If you can’t run Gephi after double-clicking on the icon, try changing the memory allocation.

Step 3: Collect Tweets

There are many ways to download tweets in different file formats. Netlytic is free and easier for many people who wouldn’t need more than 50,000 tweets for their research. To collect tweets that match your specific requirements, we will use Twitter’s advanced search function. You can use the page on this link.

Via this function, you can set various properties to reach what you want to reach. For this tutorial, we will use the word ‘refugee’ and collect tweets posted between 1 and 13 April.

1
Here, you can see all the options and details you have to create your custom search.

When you click on the ‘search’ button, you will see your search query in a standardised format.

2

Now that you have your string to collect your tweets, go to Netlytic to create a data set.

0
Click on ‘New Dataset’

Copy and paste your search query under Twitter on Netlytic.

3
If there are enough tweets, the system will initially come up with 1,000 records. If you checked the selection to collect tweets over an extended period (up to 31 days), it will continue to collect tweets until you reach the maximum allowed number of records or the end of the collection period. Eventually, Netlytic will create a CSV file that you can use for other purposes. These files contain information on things like the date and time of posting, the text of the tweets, and even locations if they are tagged. Step 4: Run an Initial Analysis on Netlytic The following is just an example of what you can do solely on Netlytic.
4
Click on ‘1000 remaining posts’ and this will bring information about all the words used in the 1,000 tweets containing the word ‘refugee’.
5
With this particular search, Netlytic found 10,943 unique words. If you plan to conduct a totally different research on the content, you can export these words (and their counts) in the CSV format. Alternatively, you can click on ‘words cloud’ and see most popular words.

Let’s move on. If you go to the ‘network analysis’ tab on Netlytic and run the analysis, you end up with a graph! Graphs are cool. They contain information on how units in a network are connected.

6
You can see general information about how many people are involved in these networks and how many ties they have with others on Netlytic.
7
If you click on ‘visualize’ you will see a basic representation of your network, available in many background and colour choices, five automatically created modularity classes, and three different layouts. One of the biggest advantages of Netlytic is that it lets you see how smaller networks in a bigger network are connected.  This is an example of how a large Twitter network looks on Netlytic as a combination of smaller networks formed around leading users.

This initial network analysis can help you notice the ‘elephants in the room’ and obtain basic information about the network.

Step 5: Export Your Graph

Now that we have collected our tweets, we will move on to export our graph to be used in Gephi. There are many options to choose among the file formats supported by Netlytic for export. We will click on ‘export’ on the following screen and choose GraphML. This file format works in Netlytic, Gephi, NodeXL and many other platforms. Netlytic will e-mail you a link to the file when it is ready.

6

Step 6: Open Your Graph File in Gephi and Run the Essential Statistical Tests

After opening our Graph file (it will automatically detect the graph as a directed one, but if it doesn’t for any reason, just select ‘directed’), we will see the following initial screen.

7

At his stage, the graph is a mess. We will have a chance to make it look much better, but we have to run the statistical tests on the right side of the Gephi window.

Basically click on everything you see in this section.

8

9
After running all the necessary tests, the ‘Data Laboratory’ tab at the top will give you all the statistical information you might use (in Gephi or anything else — as these can be exported).

Let’s go back to the ‘Overview’ tab, because our graph is still a huge mass.

Step 7: Make the Graph Look Better

This part really depends on your choice of statistical tool of comparison, artistic views and what you want to show.

The importance or influence of units (nodes) in a graph is measured by their centrality. Here are some basic ways of understanding centrality (in very, very simple terms):

Degree

Number of connections a node has. If you are working with a directed graph (like Twitter mention networks), there are two other concepts: indegree and outdegree.

Closeness Centrality

Demonstrates how close to others a node is. It is based on the total length of all shortest paths to other nodes.

Betweenness Centrality

Demonstrates how much a node acts like a bridge to all others.

Eigenvector Centrality

Importance of a node is dependent on the importance of the nodes it is connected to.

For this example, I will use Eigenvector Centrality to measure the importance of users in the network.

Let’s shape up our graph, starting with the colours.

colour
Here, I chose the colour red for the nodes with the lowest EC, green for the nodes with the highest EC, and yellow for the ones in the middle. I didn’t have to choose a colour in the middle, but the mixture of bright red and bright green makes it harder to pick out the nodes in the middle. So…
colour_graph
This is what we end up with after clicking on ‘apply’. Greener = more important.

Let’s adjust the sizes.

size
Again, based on EC, I chose the interval for size.
size_graph
This is how it looks for now. It still is messy.

This part is where our layouts and plugins come handy. Gephi offers various different layouts to present your graph and a few plugins to make it look better. For this example, I will use one of the most popular layouts: Fruchterman-Reingold. It’s a force-directed graph algorithm and it will give us a clear view of the connections in our graph. Let’s not forget that we are also able to change different parameters before executing these algorithms.

fr-start
This is how to set the parameters and run the layout algorithm.
fr-middle
This is how the graph looks in the middle of the process. Remember: some layouts have an equilibrium and they stop when they are fully-formed. The F-R layout should be stopped when the graph forms a desired shape.
fr-end
This is how the graph looks after I obtained a desirable shape and stopped the algorithm. It still looks stuffed with all those small nodes though… I will show you how to filter the information you don’t want to see on your graph.

I will eliminate all nodes with only one connection each and run the algorithm again. This will clear up my graph (while not affecting the statistics).

filters
On the right side of the window, click on the ‘Filters’ tab, and under range, double click on ‘Degree’, or drag and drop it below ‘Queries’.
filters-degree
Now we have a refined graph and information on how much of the entire graph is shown on screen. You can change the details to suit your needs.

Let’s run the algorithm again.

final-wotext

Let’s add labels (usernames). Click on the text symbol (bold ‘T’ in black) in the bottom-left corner of the main window.

final-wthtext
The refined graph with labels.

You might have noticed that some nodes and labels overlap and it makes them harder to view. Gephi has a solution for this.

noverlap
Apply ‘Noverlap’, which adjusts the nodes to fix overlapping.
labeladjust
Apply ‘Label Adjust’, which adjusts the nodes to fix overlapping of labels and makes labels readable.
finalfinaloverview
This is basically ‘how we roll’. However, this isn’t what we want our graph to look like in presentations.

Step 8: Final Touches and Exporting the Graph

In this last step of our network visualisation, we should have regularly saved our project to avoid loss of data. As I mentioned earlier, Gephi is still unpredictable and unstable, and you might experience crashes, freezing, hanging, etc. Fortunately, during the preparation of this tutorial, I haven’t experienced any of these. You just might. Save your work!

For creation of our final product, let’s click on the ‘Preview’ tab at the top of the window.

preview
After clicking ‘refresh’ in this window, our graph should appear as it will do in its final form. In this example, node sizes (10–100) were too big, which made the labels appear too big. This isn’t a problem. On the left pane, there are several options to edit the final appearance your graph. I will start with the font size.
finalfinalpreview
After making labels smaller, adjusting edges and curves, this is how our almost-final product looks like.

The rest, truly, is cosmetics. You might notice some overlapping on the labels. Sometimes, adjusting the sizes fixes the issue. Sometimes, you have to go back and manually adjust the positions of your nodes to create the best looks. I won’t do any of these here.

Once you’ve decided on the final appearance of your graph, you are ready to export it. These images can be exported as PNG, SVG or PDF files. The entire project can be exported as graph files and text files.

Let’s click on file > export > SVG/PDF/PNG and export our image.

finalfinalffffff
You will notice the options button. This will allow us to edit the resolution and other properties of our final exported image.

2016-04-13_0555

At this point, you are done! Don’t forget to save your project file before quitting the application so you can use all the statistical information later. Don’t forget that you can export your graph in other formats usable in other platforms.

Cheers!

If you have questions or anything to say, leave a comment!

Link to the original post. <— You can also find out about supporting me here. Or… you can use this information —> HERE.


Comments

2 responses to “Twitter Network Analysis and Visualisation with Netlytic and Gephi 0.9.1”

  1. I am curious 🙂 how do you interpret the concept of ‘betweenness’ or ‘closeness’ in the name network of netlytic? in ‘who mentions whom’, or ‘who replies to whom’.

    1. Yusuf Salman Avatar
      Yusuf Salman

      I don’t think you can interpret these things in Netlytic. It just provides basic information on things like density, reciprocity, centralisation and modularity. This is why I usually use Netlytic to collect data instead of analysing it. Once you collect the data and download it in a format that is suitable for programs like Gephi (which I use), it is easier to do further analyses and reach more detailed results.

Leave a Reply to Mila Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.