Javascript is required

WAN-scraper

Capture d'écran du référentiel GitHub WAN-scraper
  • #

Web scraper that works with DNS servers to collect random data around the world.

Repo #634528101
AuteurLorenzo Rottigni
Créé à2023-04-30
Mis à jour à2025-05-16
Poussé à2023-04-30
Taille9 MB
Langage principalPython
Nombre d'étoiles0
Branche par défautmain
  • Python

README.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

  • Python 3.x
  • Scrapy

Installation & Usage

  1. Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
  2. Create venv "scraper": python3 -m venv scraper
  3. Activate scraper venv: source scraper/bin/activate
  4. Install dependencies: pip3 install -r requirements.txt
  5. Run Scrapy spider: scrapy crawl wan_scraper
  6. The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

  • The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
  • Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
  • For more information on using Scrapy, please refer to the official documentation.