binux/pyspider

 binux / pyspider

binux / pyspider

A Powerful Spider(Web Crawler) System in Python. http://docs.pyspider.org/

pyspider00101020203030404050506060707080802017-06-222017-06-232017-06-242017-06-252017-06-262017-06-272017-06-282017-06-292017-06-302017-07-012017-07-022017-07-032017-07-042017-07-052017-07-062017-07-072017-07-082017-07-092017-07-102017-07-112017-07-122017-07-132017-07-142017-07-152017-07-162017-07-172017-07-182017-07-192017-07-202017-07-21pyspider69.11538461538147.795249132017-06-22024.8315649867158.2160313242017-06-23240.5477453581154.7424372592017-06-24156.2639257294156.4792342922017-06-25571.9801061008149.5320461622017-06-26787.6962864721146.0584520982017-06-271103.412466844156.4792342922017-06-284119.128647215151.2688431952017-06-292134.844827586154.7424372592017-06-303150.561007958153.0056402272017-07-014166.277188329151.2688431952017-07-026181.9933687147.795249132017-07-0315197.709549072132.1640758392017-07-0485213.42572944310.58828357622017-07-0568229.14190981440.11383312582017-07-0635244.85809018697.42813519262017-07-076260.574270557147.795249132017-07-085276.290450928149.5320461622017-07-0914292.0066313133.9008728712017-07-101307.722811671156.4792342922017-07-1112323.438992042137.3744669362017-07-1210339.155172414140.8480610012017-07-139354.871352785142.5848580332017-07-143370.587533156153.0056402272017-07-153386.303713528153.0056402272017-07-165402.019893899149.5320461622017-07-175417.736074271149.5320461622017-07-1814433.452254642133.9008728712017-07-197449.168435013146.0584520982017-07-208464.884615385144.3216550652017-07-2119.11538461538156.4792342922017-06-22-124.8315649867159.9528283562017-06-23140.5477453581156.4792342922017-06-24056.2639257294158.2160313242017-06-25471.9801061008151.2688431952017-06-26187.6962864721156.4792342922017-06-27-1103.412466844159.9528283562017-06-281119.128647215156.4792342922017-06-292134.844827586154.7424372592017-06-300150.561007958158.2160313242017-07-010166.277188329158.2160313242017-07-025181.9933687149.5320461622017-07-033197.709549072153.0056402272017-07-0422213.425729443120.0064966132017-07-0515229.141909814132.1640758392017-07-069244.858090186142.5848580332017-07-071260.574270557156.4792342922017-07-081276.290450928156.4792342922017-07-094292.0066313151.2688431952017-07-103307.722811671153.0056402272017-07-117323.438992042146.0584520982017-07-122339.155172414154.7424372592017-07-136354.871352785147.795249132017-07-142370.587533156154.7424372592017-07-155386.303713528149.5320461622017-07-162402.019893899154.7424372592017-07-171417.736074271156.4792342922017-07-180433.452254642158.2160313242017-07-192449.168435013154.7424372592017-07-201464.884615385156.4792342922017-07-21-19.11538461538159.9528283562017-06-22224.8315649867154.7424372592017-06-23-140.5477453581159.9528283562017-06-24-156.2639257294159.9528283562017-06-25071.9801061008158.2160313242017-06-26087.6962864721158.2160313242017-06-271103.412466844156.4792342922017-06-281119.128647215156.4792342922017-06-290134.844827586158.2160313242017-06-300150.561007958158.2160313242017-07-011166.277188329156.4792342922017-07-021181.9933687156.4792342922017-07-033197.709549072153.0056402272017-07-049213.425729443142.5848580332017-07-057229.141909814146.0584520982017-07-064244.858090186151.2688431952017-07-072260.574270557154.7424372592017-07-082276.290450928154.7424372592017-07-09-1292.0066313159.9528283562017-07-10-1307.722811671159.9528283562017-07-111323.438992042156.4792342922017-07-121339.155172414156.4792342922017-07-131354.871352785156.4792342922017-07-141370.587533156156.4792342922017-07-150386.303713528158.2160313242017-07-160402.019893899158.2160313242017-07-171417.736074271156.4792342922017-07-180433.452254642158.2160313242017-07-19-1449.168435013159.9528283562017-07-20-1464.884615385159.9528283562017-07-21starforkwatch

 README

pyspider Build Status Coverage Status Try

A Powerful Spider(Web Crawler) System in Python. TRY IT NOW!

Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases

Sample Code

from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

Demo

Installation

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/

Contribute

TODO

v0.4.0

  • a visual scraping interface like portia

License

Licensed under the Apache License, Version 2.0