Some utilities designed to take a wikipedia xml dump and perform analysis.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Stephen 15748dc9f4 First commit 1 month ago
src First commit 1 month ago
.gitignore First commit 1 month ago
Cargo.lock First commit 1 month ago
Cargo.toml First commit 1 month ago
README.md First commit 1 month ago

README.md

Wikigraph

Some utilities designed to take a wikipedia xml dump and perform some analysis. These are not production-grade tools - they are basically just experiments.

This software is very RAM-intensive! I have 32GB of RAM and I found myself using all of it at some points.

Applications

This repo contains 3 applications:

  • wiki2graph: Takes an XML file and converts it to a collection of nodes and edges (a graph), which is saved as a file.

  • explorer: Takes in a graph and allows seeing all links from a source article, given the name of that source article. This is primarily used for debugging.

  • longestpath: Takes a graph and finds the longest path from an article to a target article. The target article is specified in the source code. The default is “Vacuum Tube”.