Some utilities designed to take a wikipedia xml dump and perform some analysis. These are not production-grade tools - they are basically just experiments.
This software is very RAM-intensive! I have 32GB of RAM and I found myself using all of it at some points.
This repo contains 3 applications:
wiki2graph
: Takes an XML file and converts it to a collection of nodes and edges (a graph), which is saved as a file.
explorer
: Takes in a graph and allows seeing all links from a source article, given the name of that source article. This is primarily used for debugging.
longestpath
: Takes a graph and finds the longest path from an article to a target article. The target article is specified in the source code. The default is "Vacuum Tube".