從GitHub上下載了幾個scrapy的項目,放到自己的目錄下執(zhí)行,居然報錯了。
我看到好像是少叫一個misc的模塊,所以想用pip 裝一下 ,但是卻報錯!
不知道這個是怎么回事,怎么安裝這玩意。
上圖是出錯的位置。
額,頭疼。。。
window7
Python3.7
scrapy 1.5.1
// 請把代碼文本粘貼到下方(請勿用圖片代替代碼)
settings.py
# Scrapy settings for douyu project
#
# For simplicity, this file contains only the most important settings by
# default. All the other settings are documented here:
#
# http://doc.scrapy.org/en/latest/topics/settings.html
#
import sys
import os
from os.path import dirname
path = dirname(dirname(os.path.abspath(os.path.dirname(__file__))))
sys.path.append(path)
from misc.log import *
BOT_NAME = 'douyu'
SPIDER_MODULES = ['douyu.spiders']
NEWSPIDER_MODULE = 'douyu.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'douyu (+http://www.yourdomain.com)'
DOWNLOADER_MIDDLEWARES = {
# 'misc.middleware.CustomHttpProxyMiddleware': 400,
'misc.middleware.CustomUserAgentMiddleware': 401,
}
ITEM_PIPELINES = {
'douyu.pipelines.JsonWithEncodingPipeline': 300,
#'douyu.pipelines.RedisPipeline': 301,
}
LOG_LEVEL = 'INFO'
DOWNLOAD_DELAY = 1
spider.py
import re
import json
from urlparse import urlparse
import urllib
import pdb
from scrapy.selector import Selector
try:
from scrapy.spiders import Spider
except:
from scrapy.spiders import BaseSpider as Spider
from scrapy.utils.response import get_base_url
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor as sle
from douyu.items import *
from misc.log import *
from misc.spider import CommonSpider
class douyuSpider(CommonSpider):
name = "douyu"
allowed_domains = ["douyu.com"]
start_urls = [
"http://www.douyu.com/directory/all"
]
rules = [
Rule(sle(allow=("http://www.douyu.com/directory/all")), callback='parse_1', follow=True),
]
list_css_rules = {
'#live-list-contentbox li': {
'url': 'a::attr(href)',
'room_name': 'a::attr(title)',
'tag': 'span.tag.ellipsis::text',
'people_count': '.dy-num.fr::text'
}
}
list_css_rules_for_item = {
'#live-list-contentbox li': {
'__use': '1',
'__list': '1',
'url': 'a::attr(href)',
'room_name': 'a::attr(title)',
'tag': 'span.tag.ellipsis::text',
'people_count': '.dy-num.fr::text'
}
}
def parse_1(self, response):
info('Parse '+response.url)
#x = self.parse_with_rules(response, self.list_css_rules, dict)
x = self.parse_with_rules(response, self.list_css_rules_for_item, douyuItem)
print(len(x))
# print(json.dumps(x, ensure_ascii=False, indent=2))
# pp.pprint(x)
# return self.parse_with_rules(response, self.list_css_rules, douyuItem)
return x
pipelines.py
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import redis
from scrapy import signals
import json
import codecs
from collections import OrderedDict
class JsonWithEncodingPipeline(object):
def __init__(self):
self.file = codecs.open('data_utf8.json', 'w', encoding='utf-8')
def process_item(self, item, spider):
line = json.dumps(OrderedDict(item), ensure_ascii=False, sort_keys=False) + "\n"
self.file.write(line)
return item
def close_spider(self, spider):
self.file.close()
class RedisPipeline(object):
def __init__(self):
self.r = redis.StrictRedis(host='localhost', port=6379)
def process_item(self, item, spider):
if not item['id']:
print 'no id item!!'
str_recorded_item = self.r.get(item['id'])
final_item = None
if str_recorded_item is None:
final_item = item
else:
ritem = eval(self.r.get(item['id']))
final_item = dict(item.items() + ritem.items())
self.r.set(item['id'], final_item)
def close_spider(self, spider):
return
items.py
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
from scrapy.item import Item, Field
class douyuItem(Item):
# define the fields for your item here like:
url = Field()
room_name = Field()
people_count = Field()
tag = Field()
希望可以給一個必有特否的回答。
北大青鳥APTECH成立于1999年。依托北京大學(xué)優(yōu)質(zhì)雄厚的教育資源和背景,秉承“教育改變生活”的發(fā)展理念,致力于培養(yǎng)中國IT技能型緊缺人才,是大數(shù)據(jù)專業(yè)的國家
北大青鳥中博軟件學(xué)院創(chuàng)立于2003年,作為華東區(qū)著名互聯(lián)網(wǎng)學(xué)院和江蘇省首批服務(wù)外包人才培訓(xùn)基地,中博成功培育了近30000名軟件工程師走向高薪崗位,合作企業(yè)超4
中公教育集團(tuán)創(chuàng)建于1999年,經(jīng)過二十年潛心發(fā)展,已由一家北大畢業(yè)生自主創(chuàng)業(yè)的信息技術(shù)與教育服務(wù)機(jī)構(gòu),發(fā)展為教育服務(wù)業(yè)的綜合性企業(yè)集團(tuán),成為集合面授教學(xué)培訓(xùn)、網(wǎng)
達(dá)內(nèi)教育集團(tuán)成立于2002年,是一家由留學(xué)海歸創(chuàng)辦的高端職業(yè)教育培訓(xùn)機(jī)構(gòu),是中國一站式人才培養(yǎng)平臺、一站式人才輸送平臺。2014年4月3日在美國成功上市,融資1
曾工作于聯(lián)想擔(dān)任系統(tǒng)開發(fā)工程師,曾在博彥科技股份有限公司擔(dān)任項目經(jīng)理從事移動互聯(lián)網(wǎng)管理及研發(fā)工作,曾創(chuàng)辦藍(lán)懿科技有限責(zé)任公司從事總經(jīng)理職務(wù)負(fù)責(zé)iOS教學(xué)及管理工作。
浪潮集團(tuán)項目經(jīng)理。精通Java與.NET 技術(shù), 熟練的跨平臺面向?qū)ο箝_發(fā)經(jīng)驗(yàn),技術(shù)功底深厚。 授課風(fēng)格 授課風(fēng)格清新自然、條理清晰、主次分明、重點(diǎn)難點(diǎn)突出、引人入勝。
精通HTML5和CSS3;Javascript及主流js庫,具有快速界面開發(fā)的能力,對瀏覽器兼容性、前端性能優(yōu)化等有深入理解。精通網(wǎng)頁制作和網(wǎng)頁游戲開發(fā)。
具有10 年的Java 企業(yè)應(yīng)用開發(fā)經(jīng)驗(yàn)。曾經(jīng)歷任德國Software AG 技術(shù)顧問,美國Dachieve 系統(tǒng)架構(gòu)師,美國AngelEngineers Inc. 系統(tǒng)架構(gòu)師。