Social Network Analysis of Political Communities

About | Use Case | Data description | Social Network Analysis | Charts | Visualizations | Evaluation | Technical description | Copyright

About

Project introduction

We investigate whether social network analysis techniques can be applied automatically on the Dutch parliamentary data and whether they provide additional insight above the raw data. This data is available in XML, PDF and HTML.

A separate data extraction project parsed the raw data files and created listings of names of persons and their relations.

In this project, we did the following:

The project was carried out by Arthur Suermondt under supervision of Maarten Marx, at the University of Amsterdam.

Site tips

Use Case

We would like to evaluate the results of this project using a use case scenario based on the recent Dutch House of Representatives election (June 9, 2010) and cabinet formation process. Please use the results section and charts as well as the interactive visualizations.

Context

The results of the Dutch House of Representatives election are just in. Due to the fragmented results the cabinet formation process has become very difficult. You are in the first phase of the formation process in which the different coalition possibilities are being explored. A coalition needs more than 75 seats in the House of Representatives. The following coalitions would have enough seats (NOS.nl):

It does not matter if you cannot solve the problem at hand, this use case is about evaluating the added value of this project when trying to answer complex political problems as well as providing a usable interface to do so.

Goal

Part 1

Based on the results provided on this page, advice the newly appointed 'informateur' which of the parties listed above are known to cooperate well. Also provide some suggestions for cabinet candidates (minister or state secretary) to make sure the new cabinet consists of a team of well-cooperating politicians.

Part 2

During the first and second phases of the formation process, negotiation is essential to form a successful coalition. Your goal is to find the politicians with the best connections to politicians of other parties. These politicians can play a key role when negotiating the different aspects of the coalition agreement. It is important to find links between each of the parties in a potential coalition. That way it is possible to successfully negotiate the topics important to each party.

Feedback

We very much appreciate your feedback as it helps to improve the results and the usability of this interactive page.
Please send your feedback to Arthur Suermondt and/or Maarten Marx.

Data description

Sources

The following data sources were used to establish the social networks.

Dataset Years available Documents Source
Motions 1969 - 2010 32.950 PoliDocs XML
Questions 1995-2010 61.494 Preprocessed datasource
Amendments 1995 - 2009 3.306 Preprocessed datasource

Periods

The table below shows the periods used to filter the data.

Period Start date End date
Kabinet Kok I 22-8-1994 3-8-1998
Kabinet Kok II 3-8-1998 22-7-2002
Kabinet Balkenende I 22-7-2002 27-5-2003
Kabinet Balkenende II 27-5-2003 7-7-2006
Kabinet Balkenende III 7-7-2006 22-2-2007
Kabinet Balkenende IV 22-2-2007 now
Source: rijksoverheid.nl

Relations

Social network relations were constructed based on the following connection criteria.
The shorthand for these relations, used in the results below, is given in parentheses.

MotionsAll Members of Parliament co-submitting a motion ('MPs - submitting')
All parties with members co-submitting a motion ('Parties - submitting')
All parties voting in favor of a motion ('Parties - voted pro')
All parties voting against a motion ('Parties - voted against')
QuestionsMembers of Parliament co-submitting a parliamentary question ('MPs - submitting')
AmendmentsMembers of Parliament co-submitting an amendment ('MPs - submitting')

The table below summarizes which focus levels are available for each data source.

MPs - submittingParties - submittingParties - voted proParties - voted against
MotionsYesYesYesYes
QuestionsYesNoNoNo
AmendmentsYesNoNoNo

Social network analysis

Select the data source, period, focus level and type of results you would like to view.
Compare up to six resultsets at the same time.










Charts

Select the data source and focus level you would like to view.


Download all chart files.

Visualizations

Interactive visualizations

Download

An interactive visualization is available as a Java application, view it as a:

Please note that this application requires Java 6 or newer, available as a free download.

Usage

Controls

Select Click a node to center it.
Open personal info Double-click a node to open the personal page for this person or party on pentapolitica.nl.
(Only available for some nodes in the 'motions' datasets. Java 6 only.)
Connected nodes Hover over a node to see its connected nodes (in green).
Drag Left-click and drag a node to move it around.
Pan Left-click and drag the background to pan the display view.
Zoom Right-click and drag the mouse up or down or use the scroll wheel to zoom the display view.
Zoom-To-Fit Right-click once to zoom the display to fit the whole graph.

Screenshots

Left: Data selection menu showing six cabinet periods.
Bottom: Bottom bar showing the active period and focus level, currently
selected node and the search box.


Color coding

The table below shows the color coding used for the interactive visualizations. The 'questions' and 'amendments' data sources' nodes are colored as 'unknown party', as party information was unavailable in these sets.

Left-wing partyRight-wing partyCoalition partyConnected nodes / Search resultsUnknown party

Limitations

Graphs were preprocessed using the analysis.py script to filter nodes, edges and components. The table below shows the cut-off point used for each dataset at the Member of Parliament focus level. All nodes having less edges than the cut-off point were removed from the graph to improve readibility. E.g. if a node has 8 edges and the cut-off point is 9, it is removed from the graph. Cut-off points were selected to obtain a remaining number of nodes in the range of 80-105.
To further improve readibility, only the largest component was used for the visualization. The 'motions' dataset at the 'parties' focus level was not cut-off, these datasets are considerably smaller and can therefore directly be used in a visualization.

Dataset at MPsCut-off pointNodes remainingEdges remaining
submitting levelMotionsQuestionsAmendments MotionsQuestionsAmendments MotionsQuestionsAmendments
Kok I8649789102271177285
Kok II12351048791293170224
Balkenende I4219611478391257222
Balkenende II12531008474324194180
Balkenende III3218611195333224251
Balkenende IV15719385103196159242

Static visualizations

Below are some examples of static visualizations rendered by igraph. Static visualizations allow for more precise and greater customizability, at the cost of speed and interactivity.

Graph of the 'Balkenende 2' network at party focus level. Customized to improve readibility. Removed and repositioned nodes to show the brokerage role of the 'PvdA' node.

Graph of the 'Balkenende 4' network at Members of Parliament focus level. Rendered as an egonetwork graph, using a random node as the centroid.

Evaluation

Goal evaluation

One of the goals of this project was to determine whether social network analysis techniques could provide additional insight above the raw data. To evaluate this goal we formulated several questions that are currently very difficult or even impossible to answer using the raw data.

It turns out that some of these questions can be answered using the results presented on this page. Several others still remained hard to answer using the results provided, but could be answered by directly querying the GraphML network source files, produced as a part of this project. Finally some examples of questions remaining unanswerable by the results of this project are included.

Questions answerable using the provided results

The questions below can each be answered using the results provided on this page. Each question requires different kinds of results to answer it. The type of result that can be used to answer each question is given in parentheses. Some questions can only be answered using the visualizations provided on this page. To achieve a definite answer to those question the GraphML network source files should be used, due to the equivocal nature of visualizations.

* = this result type gives an indication of the answer, other result types are necessary to support the conclusions.

Questions answerable using the GraphML source data

The questions below are hard to answer using just the results provided on this page. To answer them the GraphML network source files should be queried directly.

Questions remaining unanswerable

Below are some examples of questions remaining unanswerable using the results of this project. This exemplifies the differences between social network analysis related questions and other questions related to the data sources.

Technical description

Settings and execution

A short overview of the data processing flow is described here. A full description of the technologies used is included below.

1. Source data to graph files

Source data was processed using XQueries to generate the nodes and edges for the graph. The Java VM was assigned a maximum of 1024 MB of memory to make sure it could handle the complete dataset.

java -Xmx1024m net.sf.saxon.Query -q:nodes.xq > nodes.xml
java -Xmx1024m net.sf.saxon.Query -q:edges.xq > graphml.xml

2. Analysis of the graph files

GraphML network files were processed using the igraph python module.

python analysis.py graphml.xml

3. Write analysis results to database

The analysis results produced by igraph were stored in a MySQL database, used by the website.

python analysis.py -sd -n m_balkenende1_s graphml.xml

4. Generate static visualizations

Static visualization were generated using the igraph python module and the cairo drawing package.

python analysis.py -sv graphml.xml

5. Preprocess the graph file for use in interactive visualizations

The graphs were preprocessed using the analysis.py script. Excessive nodes and edges were stripped to improve the visibility of the visualizations. Labels and colors were assigned to the correct attributes.

python analysis.py -spr -c 10 graphml.xml

Platform

Platform used for all computations:

XQuery data processing

Saxon-HE 9.2.0.6J was used to process the XQueries. Two XQueries were used to generate the GraphML files. One for generating the node list, and one to calculate and weigh the edges. All XQueries are available for download.

GraphML network file

GraphML is an xml-based language to describe the structural properties of a graph. GraphML uses a list of nodes and edges to describe the structure of the graph. Nodes represent the actors in the social network, while edges represent the connections between them. Additional attributes such as labels, weights and other metadata can be added to node and edge elements.
All GraphML network files are available for download.

Social Network Analysis software

The igraph 0.5.3 Python module was used to analyze the graph data using Social Network Analysis techniques. A custom python script called analysis.py was created for this project. Results were stored in a MySQL database. See the SNA section for the results of this analysis.

Command line usage:
python analysis.py [options] <graphmlfile>

Returns the Social Network Analysis results for a specified GraphML file.
Optionally writes the results to a database.

Options:
  -h, --help        This message

  -d, --database    Write results to database
                    Requires [name] to be supplied
  -n, --name ...    The resultset name to use when writing to database
  -s, --silent      Don't show output in command line
  -v, --visualize   Generate graph visualizations
                    Outputs a PNG graphic
  -r, --remove      Remove unconnected nodes
  -p, --preprocess  Preprocess the graph for use in an interactive visualization
                    Optionally takes [cutoff] value instead of the default cutoff point
                    Outputs a graphml network file
  -c, --cutoff ...  Optional cut-off value used to determine the minimum number of edges
                    for a node when preprocessing

Static visualizations were rendered by the igraph module, using Cairo 1.8.8_0 as a drawing package. The igraph module was also used to preprocess the data for visualizations, filtering nodes and edges and setting attributes.

Graph Visualization software

The Prefuse (release 2007.10.21) toolkit was used as a framework to build the interactive visualizations. The GraphML files were processed using igraph and exported as new GraphML files, used as the input data for the interactive visualizations. The Java source file is available for download.

Creative Commons License Copyright © 2010 Arthur Suermondt - This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 Netherlands License.