C:\scrapy(project folder)\scrapy.cfg 在系統(tǒng)中;
~/.config/scrapy.cfg ($XDG_CONFIG_HOME) and ~/.scrapy.cfg ($HOME) ,這些是全局設(shè)置
SCRAPY_SETTINGS_MODULE
SCRAPY_PROJECT
SCRAPY_PYTHON_SHELL
scrapy.cfg - Deploy the configuration file project_name/ - Name of the project _init_.py items.py - It is project's items file pipelines.py - It is project's pipelines file settings.py - It is project's settings file spiders - It is the spiders directory _init_.py spider_name.py . . .
[settings] default = [name of the project].settings [deploy] #url = http://localhost:6800/ project = [name of the project]
Scrapy X.Y - no active project Usage: scrapy [options] [arguments] Available commands: crawl It puts spider (handle the URL) to work for crawling data fetch It fetches the response from the given URL
scrapy startproject scrapy_project
cd scrapy_project
scrapy genspider mydomain yiibai.com
scrapy -h
fetch: 它使用Scrapy downloader 提取的 URL。
runspider: 它用于而無需創(chuàng)建一個(gè)項(xiàng)目運(yùn)行自行包含蜘蛛(spider)。
settings: 它規(guī)定了項(xiàng)目的設(shè)定值。
shell: 這是一個(gè)給定URL的一個(gè)交互式模塊。
startproject: 它創(chuàng)建了一個(gè)新的 Scrapy 項(xiàng)目。
version: 它顯示Scrapy版本。
view: 它使用Scrapy downloader 提取 URL并顯示在瀏覽器中的內(nèi)容。
crawl: 它是用來使用蜘蛛抓取數(shù)據(jù);
check: 它檢查項(xiàng)目并由 crawl 命令返回;
list: 它顯示本項(xiàng)目中可用蜘蛛(spider)的列表;
edit: 可以通過編輯器編輯蜘蛛;
parse:它通過蜘蛛分析給定的URL;
bench: 它是用來快速運(yùn)行基準(zhǔn)測(cè)試(基準(zhǔn)講述每分鐘可被Scrapy抓取的頁面數(shù)量)。
COMMANDS_MODULE = 'mycmd.commands'
from setuptools import setup, find_packages setup(name='scrapy-module_demo', entry_points={ 'scrapy.commands': [ 'cmd_demo=my_module.commands:CmdDemo', ], }, )