WAN-scraper

Name: LR Labs
Address: Via turati 12, Milan, Italy
Telephone: +39 3806947004
Price range: $$

Captura de pantalla del repositorio WAN-scraper GitHub

Web scraper that works with DNS servers to collect random data around the world.

Repo #634528101
Autor	Lorenzo Rottigni
Creado en	2023-04-30
Actualizado en	2023-04-30
empujado en	2023-04-30
Tamaño	9 MB
Lenguaje principal	Python
Conteo de estrellas	0
Branch predeterminado	main

Python
# languages

README.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

Python 3.x
Scrapy

Installation & Usage

Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
Create venv "scraper": python3 -m venv scraper
Activate scraper venv: source scraper/bin/activate
Install dependencies: pip3 install -r requirements.txt
Run Scrapy spider: scrapy crawl wan_scraper
The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
For more information on using Scrapy, please refer to the official documentation.