鍍金池/ 問答/Python/ 如何使用Python抓取微信文章?

如何使用Python抓取微信文章?

目前,我是通過搜狗的微信搜索來搜索微信文章的,代碼如下:

def get_html(url):
    headers = {
        "Host": "weixin.sogou.com",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
        "Accept-Encoding": "gzip, deflate",
        "Referer": "http://weixin.sogou.com/weixin?type=2&ie=utf8&query=%E6%A2%85%E8%A5%BF&tsn=2&ft=&et=&interation=&wxid=&usip=",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
        "Pragma": "no-cache",
        "Cache-Control": "no-cache"
    }
    r = requests.get(url=url, headers=headers)
    html = r.text
    return html

url = 'http://weixin.sogou.com/weixin?type=2&ie=utf8&query=%E6%A2%85%E8%A5%BF&tsn=5&ft=2018-05-05&et=2018-06-05&interation=&wxid=&usip='
print(get_html(url=url))

但是執(zhí)行以上代碼的時候,會302重定向到 http://weixin.sogou.com/
第二次,我?guī)蟘ookie訪問,結(jié)果也是一樣。
這是獲取cookie的方式:

def get_cookie(timeout=30):
    url = "http://weixin.sogou.com/weixin?type=2&s_from=input&query=%E5%BE%AE%E4%BF%A1%E5%85%AC%E4%BC%97%E5%8F%B7&ie=utf8&_sug_=y&_sug_type_=&w=01019900&sut=13322&sst0=1524114483262&lkt=13%2C1524114471294%2C1524114483150"
    headers = {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8",
        "Accept-Encoding": "gzip, deflate, sdch",
        "DNT": "1",
        "Connection": "keep-alive",
    }
    r = requests.get(url=url, headers=headers, timeout=timeout)
    if 'set-cookie' in r.headers:
        cookie = r.headers['set-cookie']
    else:
        cookie = ''
    return cookie
回答
編輯回答
笨尐豬

把user-agent換成微信的

2018年5月8日 13:33