Javascript is required

WAN-scraper

WAN-scraper GitHub रिपॉजिटरी का स्क्रीनशॉट
  • #

Web scraper that works with DNS servers to collect random data around the world.

रेपो #634528101
लेखकLorenzo Rottigni
पर बनाया गया2023-04-30
पर अपडेट किया गया2025-05-16
पर धक्का दिया2023-04-30
आकार9 MB
मुख्य भाषाPython
तारों की संख्या0
डिफ़ॉल्ट शाखाmain
  • Python

मुझेपढ़ो.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

  • Python 3.x
  • Scrapy

Installation & Usage

  1. Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
  2. Create venv "scraper": python3 -m venv scraper
  3. Activate scraper venv: source scraper/bin/activate
  4. Install dependencies: pip3 install -r requirements.txt
  5. Run Scrapy spider: scrapy crawl wan_scraper
  6. The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

  • The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
  • Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
  • For more information on using Scrapy, please refer to the official documentation.