About | Use Case | Data description | Social Network Analysis | Charts | Visualizations | Evaluation | Technical description | Copyright
We investigate whether social network analysis techniques can be applied automatically on the Dutch parliamentary data and whether they provide additional insight above the raw data. This data is available in XML, PDF and HTML.
A separate data extraction project parsed the raw data files and created listings of names of persons and their relations.
In this project, we did the following:
The project was carried out by Arthur Suermondt under supervision of Maarten Marx, at the University of Amsterdam.
We would like to evaluate the results of this project using a use case scenario based on the recent Dutch House of Representatives election (June 9, 2010) and cabinet formation process. Please use the results section and charts as well as the interactive visualizations.
The results of the Dutch House of Representatives election are just in. Due to the fragmented results the cabinet formation process has become very difficult. You are in the first phase of the formation process in which the different coalition possibilities are being explored. A coalition needs more than 75 seats in the House of Representatives. The following coalitions would have enough seats (NOS.nl):
Based on the results provided on this page, advice the newly appointed 'informateur' which of the parties listed above are known to cooperate well. Also provide some suggestions for cabinet candidates (minister or state secretary) to make sure the new cabinet consists of a team of well-cooperating politicians.
Part 2During the first and second phases of the formation process, negotiation is essential to form a successful coalition. Your goal is to find the politicians with the best connections to politicians of other parties. These politicians can play a key role when negotiating the different aspects of the coalition agreement. It is important to find links between each of the parties in a potential coalition. That way it is possible to successfully negotiate the topics important to each party.
We very much appreciate your feedback as it helps to improve the results and the usability of this interactive page.
Please send your feedback to Arthur Suermondt and/or Maarten Marx.
The following data sources were used to establish the social networks.
| Dataset | Years available | Documents | Source |
|---|---|---|---|
| Motions | 1969 - 2010 | 32.950 | PoliDocs XML |
| Questions | 1995-2010 | 61.494 | Preprocessed datasource |
| Amendments | 1995 - 2009 | 3.306 | Preprocessed datasource |
The table below shows the periods used to filter the data.
| Period | Start date | End date |
|---|---|---|
| Kabinet Kok I | 22-8-1994 | 3-8-1998 |
| Kabinet Kok II | 3-8-1998 | 22-7-2002 |
| Kabinet Balkenende I | 22-7-2002 | 27-5-2003 |
| Kabinet Balkenende II | 27-5-2003 | 7-7-2006 |
| Kabinet Balkenende III | 7-7-2006 | 22-2-2007 |
| Kabinet Balkenende IV | 22-2-2007 | now |
| Source: rijksoverheid.nl |
Social network relations were constructed based on the following connection criteria.
The shorthand for these relations, used in the results below, is given in parentheses.
| Motions | All Members of Parliament co-submitting a motion ('MPs - submitting') |
|---|---|
| All parties with members co-submitting a motion ('Parties - submitting') | |
| All parties voting in favor of a motion ('Parties - voted pro') | |
| All parties voting against a motion ('Parties - voted against') | |
| Questions | Members of Parliament co-submitting a parliamentary question ('MPs - submitting') |
| Amendments | Members of Parliament co-submitting an amendment ('MPs - submitting') |
The table below summarizes which focus levels are available for each data source.
| MPs - submitting | Parties - submitting | Parties - voted pro | Parties - voted against | |
|---|---|---|---|---|
| Motions | Yes | Yes | Yes | Yes |
| Questions | Yes | No | No | No |
| Amendments | Yes | No | No | No |
Select the data source, period, focus level and type of results you would like to view.
Compare up to six resultsets at the same time.
An interactive visualization is available as a Java application, view it as a:
| Select | Click a node to center it. |
|---|---|
| Open personal info | Double-click a node to open the personal page for this person or party on pentapolitica.nl. (Only available for some nodes in the 'motions' datasets. Java 6 only.) |
| Connected nodes | Hover over a node to see its connected nodes (in green). |
| Drag | Left-click and drag a node to move it around. |
| Pan | Left-click and drag the background to pan the display view. |
| Zoom | Right-click and drag the mouse up or down or use the scroll wheel to zoom the display view. |
| Zoom-To-Fit | Right-click once to zoom the display to fit the whole graph. |

Left: Data selection menu showing six cabinet periods.
Bottom: Bottom bar showing the active period and focus level, currently
selected node and the search box.
The table below shows the color coding used for the interactive visualizations. The 'questions' and 'amendments' data sources' nodes are colored as 'unknown party', as party information was unavailable in these sets.
| Left-wing party | Right-wing party | Coalition party | Connected nodes / Search results | Unknown party |
Graphs were preprocessed using the analysis.py script to filter nodes, edges and components. The table below shows the cut-off point used for each dataset at the Member of Parliament focus level. All nodes having less edges than the cut-off point were removed from the graph to improve readibility. E.g. if a node has 8 edges and the cut-off point is 9, it is removed from the graph. Cut-off points were selected to obtain a remaining number of nodes in the range of 80-105.
To further improve readibility, only the largest component was used for the visualization. The 'motions' dataset at the 'parties' focus level was not cut-off, these datasets are considerably smaller and can therefore directly be used in a visualization.
| Dataset at MPs | Cut-off point | Nodes remaining | Edges remaining | ||||||
|---|---|---|---|---|---|---|---|---|---|
| submitting level | Motions | Questions | Amendments | Motions | Questions | Amendments | Motions | Questions | Amendments |
| Kok I | 8 | 6 | 4 | 97 | 89 | 102 | 271 | 177 | 285 |
| Kok II | 12 | 3 | 5 | 104 | 87 | 91 | 293 | 170 | 224 |
| Balkenende I | 4 | 2 | 1 | 96 | 114 | 78 | 391 | 257 | 222 |
| Balkenende II | 12 | 5 | 3 | 100 | 84 | 74 | 324 | 194 | 180 |
| Balkenende III | 3 | 2 | 1 | 86 | 111 | 95 | 333 | 224 | 251 |
| Balkenende IV | 15 | 7 | 1 | 93 | 85 | 103 | 196 | 159 | 242 |
Below are some examples of static visualizations rendered by igraph. Static visualizations allow for more precise and greater customizability, at the cost of speed and interactivity.
One of the goals of this project was to determine whether social network analysis techniques could provide additional insight above the raw data. To evaluate this goal we formulated several questions that are currently very difficult or even impossible to answer using the raw data.
It turns out that some of these questions can be answered using the results presented on this page. Several others still remained hard to answer using the results provided, but could be answered by directly querying the GraphML network source files, produced as a part of this project. Finally some examples of questions remaining unanswerable by the results of this project are included.
The questions below can each be answered using the results provided on this page. Each question requires different kinds of results to answer it. The type of result that can be used to answer each question is given in parentheses. Some questions can only be answered using the visualizations provided on this page. To achieve a definite answer to those question the GraphML network source files should be used, due to the equivocal nature of visualizations.
* = this result type gives an indication of the answer, other result types are necessary to support the conclusions.
The questions below are hard to answer using just the results provided on this page. To answer them the GraphML network source files should be queried directly.
Below are some examples of questions remaining unanswerable using the results of this project. This exemplifies the differences between social network analysis related questions and other questions related to the data sources.
A short overview of the data processing flow is described here. A full description of the technologies used is included below.
Source data was processed using XQueries to generate the nodes and edges for the graph. The Java VM was assigned a maximum of 1024 MB of memory to make sure it could handle the complete dataset.
java -Xmx1024m net.sf.saxon.Query -q:nodes.xq > nodes.xml java -Xmx1024m net.sf.saxon.Query -q:edges.xq > graphml.xml
GraphML network files were processed using the igraph python module.
python analysis.py graphml.xml
The analysis results produced by igraph were stored in a MySQL database, used by the website.
python analysis.py -sd -n m_balkenende1_s graphml.xml
Static visualization were generated using the igraph python module and the cairo drawing package.
python analysis.py -sv graphml.xml
The graphs were preprocessed using the analysis.py script. Excessive nodes and edges were stripped to improve the visibility of the visualizations. Labels and colors were assigned to the correct attributes.
python analysis.py -spr -c 10 graphml.xml
Platform used for all computations:
Saxon-HE 9.2.0.6J was used to process the XQueries. Two XQueries were used to generate the GraphML files. One for generating the node list, and one to calculate and weigh the edges. All XQueries are available for download.
GraphML is an xml-based language to describe the structural properties of a graph. GraphML uses a list of nodes and edges to describe the structure of the graph. Nodes represent the actors in the social network, while edges represent the connections between them. Additional attributes such as labels, weights and other metadata can be added to node and edge elements.
All GraphML network files are available for download.
The igraph 0.5.3 Python module was used to analyze the graph data using Social Network Analysis techniques. A custom python script called analysis.py was created for this project. Results were stored in a MySQL database. See the SNA section for the results of this analysis.
Command line usage:
python analysis.py [options] <graphmlfile>
Returns the Social Network Analysis results for a specified GraphML file.
Optionally writes the results to a database.
Options:
-h, --help This message
-d, --database Write results to database
Requires [name] to be supplied
-n, --name ... The resultset name to use when writing to database
-s, --silent Don't show output in command line
-v, --visualize Generate graph visualizations
Outputs a PNG graphic
-r, --remove Remove unconnected nodes
-p, --preprocess Preprocess the graph for use in an interactive visualization
Optionally takes [cutoff] value instead of the default cutoff point
Outputs a graphml network file
-c, --cutoff ... Optional cut-off value used to determine the minimum number of edges
for a node when preprocessing
Static visualizations were rendered by the igraph module, using Cairo 1.8.8_0 as a drawing package. The igraph module was also used to preprocess the data for visualizations, filtering nodes and edges and setting attributes.
The Prefuse (release 2007.10.21) toolkit was used as a framework to build the interactive visualizations. The GraphML files were processed using igraph and exported as new GraphML files, used as the input data for the interactive visualizations. The Java source file is available for download.
Copyright © 2010 Arthur Suermondt - This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 Netherlands License.