WAN-scraper

Name: LR Labs
Address: Via turati 12, Milan, Italy
Telephone: +39 3806947004
Price range: $$

Web scraper that works with DNS servers to collect random data around the world.

Screenshot of WAN-scraper GitHub repository

Repository Details

Repo #634528101
Author	Lorenzo Rottigni
Created At	2023-04-30
Updated At	2023-04-30
Pushed At	2023-04-30
Size	9 MB
Main Language	Python
Star count	0
Default branch	main

Repository Skills

README.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

Python 3.x
Scrapy

Installation & Usage

Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
Create venv "scraper": python3 -m venv scraper
Activate scraper venv: source scraper/bin/activate
Install dependencies: pip3 install -r requirements.txt
Run Scrapy spider: scrapy crawl wan_scraper
The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
For more information on using Scrapy, please refer to the official documentation.