分類:

推薦最新等你回答

負(fù)我心回答

$("input[type='checkbox']").is(':checked')

鹿惑回答

頁面應(yīng)該是有做過反爬蟲處理的，有關(guān)數(shù)據(jù)在html源碼中是被注釋掉的，可以先把注釋符號去掉再進(jìn)行解析

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.basketball-reference.com/teams/MIN/2018.html#all_per_game')
// 去掉html的注釋符號，并進(jìn)行解析
soup = BeautifulSoup(r.text.replace('<!--','').replace('-->',''),'lxml')
trs = soup.select('#per_game > tbody > tr')
print(trs[0])

【求教】：如何用python爬取網(wǎng)絡(luò)中tr，td以下的內(nèi)容

喵小咪回答

Python 的網(wǎng)頁解析一般有以下方法:
1.字符串方法
2.正則表達(dá)式
3.html/xml文本解析庫的調(diào)用(如著名的BeautifulSoup庫)
對于你所給的例子, 假設(shè):

>>> s = '<tr><td><b><a href=".././statistics/power" title="Exponent of the power-law degree distibution">Power law exponent (estimated) with d<sub>min</sub></a></b></td><td>2.1610(d<sub>min</sub> = 2) </td></tr>'

由于文本特征非常明顯, 可以這樣處理:
1.字符串處理方法:

>>> s.split('<td>')[-1].split('(d')[0]
'2.1610'

2.re:

>>> import re
>>> pattern = re.compile('</b></td><td>(.*)\(d<sub>')
>>> pattern.findall(s)
['2.1610']

3.BeautifulSoup:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s, 'html.parser')
>>> soup.find_all('td')[1].contents[0][:-2]
'2.1610'

以上方法均是根據(jù)給定的例子臨時設(shè)計的.

js判斷對多個版本號進(jìn)行排序怎么做？

情已空回答

var arr = ['0.1.1', '2.3.3', '0.3002.1', '4.2', '4.3.5', '4.3.4.5']
arr.sort((a,b)=>{
    var items1 = a.split('.')
    var items2 = b.split('.')
    var k = 0
    for (let i in items1) {
      let a1 = items1[i]
      let b1 = items2[i]
      if (typeof a1 === undefined) {
        k = -1
        break
      } else if (typeof b1 === undefined) {
        k = 1
        break
      } else {
        if (a1 === b1) {
          continue
        }
        k = Number(a1) - Number(b1)
        break
      }
    }
    return k
})
console.log(arr)

JS打印this對象值與打印this對象的屬性值不一致

貓館回答

console.log(JSON.stringify(this))，你看到的是你展開這個對象時的快照。

模仿b站做了一個網(wǎng)頁爬蟲，但是運(yùn)行錯誤，不知道是哪里出錯了。

青檸回答

注意正則的*號，看圖片

import requests
import re
def text():

for a in range(1,13):
    url = 'https://sf.taobao.com/list/50025969__1___%BA%BC%D6%DD.htm?spm=a213w.7398504.pagination.3.W9af3L&auction_start_seg=-1&page='+str(a)
    html = requests.get(url).text
    ids = re.findall('"id":(.*?),"itemUrl"',html)
    names = re.findall('"title":"(.*?)"',html)
    prices = re.findall('"initialPrice": (.*?) ,"currentPrice"',html)
    find = zip(ids,names,prices)
    for txt in find:
        print(txt)

if name == '__main__':

print('\t\t\t序號\t\t\t','\t\t\t\t\t地點(diǎn)\t\t\t','\t\t\t\t\t\t價格')
text()

圖片描述

python 為何數(shù)組切片時最后加上0就能將array合并？

練命回答

參考這里
發(fā)現(xiàn)自己也不懂，抱著學(xué)習(xí)的態(tài)度把上面的翻譯了一遍(翻譯得太挫勿噴，能看英文盡量看英文吧)

貼在CSDN上了，鏈接在這

pandas groupby分組困惑

護(hù)她命回答

log_df[['id','device']].groupby(['id'])['device'].apply(lambda x:len(set(x)))

python爬蟲json解析的問題

陪我終回答

Quite simple:

>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"

>>> data = json.loads('{\"count\":8,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7084773fb2\"}],\"uri\":\"origin\\/470700000c7084773fb2\",\"height\":1590},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47050001b69355a0bf1b\"}],\"uri\":\"origin\\/47050001b69355a0bf1b\",\"height\":1557},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470300020761150d671a\"}],\"uri\":\"origin\\/470300020761150d671a\",\"height\":1552},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/47000002200f2a0a9020\"}],\"uri\":\"origin\\/47000002200f2a0a9020\",\"height\":1575},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470000022011d5569ccb\"}],\"uri\":\"origin\\/470000022011d5569ccb\",\"height\":1588},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/4700000220127db96444\"}],\"uri\":\"origin\\/4700000220127db96444\",\"height\":1561},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"}],\"uri\":\"origin\\/46ff000532e33a9fa35a\",\"height\":1563},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470700000c7b871a5fae\"}],\"uri\":\"origin\\/470700000c7b871a5fae\",\"height\":1575}],\"max_img_width\":1178,\"labels\":[],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\"]}'.decode('string_escape'))
>>> 
>>> data["count"]
8
>>>

PHP查詢數(shù)組某值

心上人回答

$arr = $arr['data'];

$arr1 = array_filter($arr, function ($item){
    return $item['symbol'] == 'BTC';
});
var_dump($arr1);

requests cookies模擬登錄遇到問題

孤酒回答

問題已解決，又被js坑了（下次一定記住）。
那個頁面我選擇的標(biāo)簽是用js動態(tài)添加的，所以什么都爬不到就正常了。然后我又分析了一下用爬蟲獲取的頁面，是已經(jīng)登錄成功了的。

thinkphp5 QueryList4 PhantomJs 網(wǎng)頁爬蟲報錯怎么解決？

撥弦回答

樓上正解，QueryList文檔也有說明：http://doc.querylist.cc/site/...

python 爬蟲導(dǎo)入網(wǎng)頁cookie self.cookie是網(wǎng)頁cookie 返回結(jié)果不對

不將就回答

看看你的headers是什么樣的

jquery 遍歷數(shù)組賦值的問題？

空痕回答

inputVals數(shù)組有多少個元素啊?
沒看懂這個賦值是怎么賦值.

async+await 異步圖片爬蟲，最后幾張圖片超時錯誤？

雨蝶回答

超時錯誤是服務(wù)端問題，又不是客戶端問題。網(wǎng)絡(luò)出錯很正常啊，重試就好。

php 印出 json array一直失敗?

淺時光回答

需要先將$result結(jié)果使用$result = json_decode($result, true);解析為數(shù)組，之后再執(zhí)行如下操作

foreach($result['list'] as $mydata)
{
    echo $mydata['name'];
}

如何獲取雅虎財經(jīng)里面search input里給出的自動響應(yīng)建議？

尋仙回答

搜索建議結(jié)果是用js動態(tài)生成的.
可以直接觀察它是向哪個 api 請求的.
比如搜索hello, 可以直接請求
https://finance.yahoo.com/_finance_doubledown/api/resource/searchassist;searchTerm=hello
那么代碼可以這樣寫:

import json
import requests

kw = 'hello'
url_base = 'https://finance.yahoo.com/_finance_doubledown/api/resource/searchassist;searchTerm='
url = url_base + kw
resp = requests.get(url)
print(json.dumps(json.loads(resp.text), indent=4, sort_keys=True))

得到類似的結(jié)果:

{
    "hiConf": false,
    "items": [
        {
            "exch": "FRA",
            "exchDisp": "Frankfurt",
            "name": "HelloFresh SE",
            "symbol": "HFG.F",
            "type": "S",
            "typeDisp": "Equity"
        },
        ...

我嘗試的貌似直接請求即可, 尚不知 yahoo 有沒有限制請求的措施.

關(guān)于 go語言數(shù)組指針的疑問

離人歸回答

其實是編譯器帶你做了轉(zhuǎn)換，提高了容錯性，防止不必要的思考

yy直播的“貢獻(xiàn)榜”數(shù)據(jù)是哪個請求加載的？

赱丅呿回答

應(yīng)該不是你可以使用fiddler 抓包看下

關(guān)于 mysql 中的json 解析excel

我以為回答

瀉藥，看起來你的問題已經(jīng)解決了。
一個建議是，對于爬蟲抓取類程序，我通常會選擇mongodb而非mysql這樣的關(guān)系型數(shù)據(jù)庫進(jìn)行存儲，有很多好處：

爬蟲類程序一大難題在于被抓取的數(shù)據(jù)格式很多時候在你遇到問題之前是無法預(yù)知的，mongo是nosql，字段靈活，一個集合當(dāng)中你插入的每一條文檔都可以有不同的key，查詢時按照mongo的那一套也完全沒問題，如果sql系db添加一個字段可能涉及到整張table的修改
mysql的優(yōu)勢在于事務(wù)，適合成熟穩(wěn)定的業(yè)務(wù)類型，爬蟲抓取存儲的一手?jǐn)?shù)據(jù)多數(shù)情況是臨時性的，往往會開發(fā)第二層、第三層的查庫、篩選、清洗程序，那時你可以從mongo中取出需要的數(shù)據(jù)存入相應(yīng)的其他db滿足業(yè)務(wù)需求，或直接dump出excel