For managing web proxies across different projects/processes
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Stephen db3c7cc5aa Configurable per-website blacklist time, better rng 3 months ago
debian deb file 3 months ago
src Configurable per-website blacklist time, better rng 3 months ago
.gitignore Don't put quotes around proxy response 3 months ago
Cargo.lock Configurable per-website blacklist time, better rng 3 months ago
Cargo.toml Configurable per-website blacklist time, better rng 3 months ago
Jenkinsfile deb file 3 months ago
README.md Change blacklist parameters 3 months ago
config.example.toml Configurable per-website blacklist time, better rng 3 months ago
rustfmt.toml rustfmt 3 months ago

README.md

Proxy manager

Wip!

What is it for?

I am currently renting around 100 web proxies for use in a side project which involves scraping a website(“Site A”). However, there is no reason why I couldn’t use these proxies in other side projects which scrape other sites(“Site B”, “Site C”, etc.) Also, if I scrape “Site A” in various different side projects, or in different processes of the same side project, I need to be careful not to hit any rate limits.

This project is my attempt at solving these problems.

How is it used?

First, a TOML file must be written which lists the available proxies, as well as the possible websites to scrape and their rate limits. An example file, config.example.toml, is provided, which lists 3 proxies, and sets up Facebook with a limit of one request per 5 seconds, and Google with no rate limit.

Hitting http://localhost:3030/v1/facebook will return a random proxy address. It will always ensure that no proxy makes more requests than the rate limit allows. If necessary, it will wait for the correct amount of time to pass before returning a proxy address. Of course, hits to Google don’t affect Facebook, so we keep track of their rate limits separately.

It is also possible to POST http://localhost:3030/v1/facebook/ with the JSON body { "proxy": "localhost:1234" }, which will blacklist that proxy/site combination for 15 minutes. This is normally done if you get blocked by the site and need to stop making requests for a while.