鍍金池/ 問答/ 數(shù)據(jù)分析&挖掘問答
帥到炸 回答

處理數(shù)組、對象數(shù)據(jù)的思路很重要,一般手段是遞歸,或者借助第3個(gè)變量來完成遍歷,還可以利用對象key值不重復(fù)的原理做很多事情。

真難過 回答

注意:python3以后才支持yield from語法

import collections


def flatten(d, prefix="", sep="_"):
    def _take_prefix(k, v, p):
        if p:
            yield from flatten(v, "{}{}{}".format(p, sep, k))
        else:
            yield from flatten(v, str(k))

    if isinstance(d, dict):
        for k, v in d.items():
            if isinstance(v, str) or not isinstance(v, collections.Iterable):
                if prefix:
                    yield "{}{}{}".format(prefix, sep, k), v
                else:
                    yield k, v
            elif isinstance(v, dict):
                yield from _take_prefix(k, v, prefix)
            elif isinstance(v, list):
                for i in v:
                    yield from _take_prefix(k, i, prefix)
            else:
                pass
    else:
        pass

dic = {your dataset}
for key, value in flatten(dic):
    print("{}: {}".format(key, value))

結(jié)果如下,應(yīng)該能拍平了

status: changed
dataset_id: 5a4b463c855d783af4f5f695
dataset_name: AE_E
dataset_label: 1- ADVERSE EVENTS - Not Analyzed
details_variables_variable_id: 5a4b4647855d783b494f9d3f
details_variables_variable_name: CPEVENT
details_variables_variable_label: CPEVENT
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: factor
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9d25
details_variables_variable_name: CPEVENT2
details_variables_variable_label: CPEVENT2
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: binary
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9d26
details_variables_variable_name: CP_UNSCHEDULED
details_variables_variable_label: CP_UNSCHEDULED
details_variables_status: changed
details_variables_details_r_type_new_value: undefined
details_variables_details_r_type_old_value: unary
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9d02
details_variables_variable_name: VISIT_NUMBER
details_variables_variable_label: VISIT_NUMBER
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: integer
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9ccf
details_variables_variable_name: VISIT_NUMBER2
details_variables_variable_label: VISIT_NUMBER2
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: binary
details_variables_message: Variable with different R Type
details_many_visits: None

針對你修改后的問題, 再加個(gè)函數(shù)就搞定:

# 這個(gè)fuck_all函數(shù)比較特例, 完全是針對你要區(qū)分的dataset下面的N個(gè)變量信息這種需求
def fuck_all(dic, prefix="details_variables"):
    lst = list(flatten(dic))  # flatten函數(shù)則比較通用,任何嵌套數(shù)據(jù)集都可以用它拍平
    lines = []
    top = {k: v for k, v in lst if not k.startswith(prefix)}
    index = 0
    for key, value in lst:
        if not key.startswith(prefix):
            continue
        else:
            if not lines:
                lines.append(top.copy())
        if key in lines[index].keys():
            index += 1
            lines.append(top.copy())
        lines[index][key] = value
    return lines

d = {your dataset}
for i in fuck_all(d):
    print(i)    

結(jié)果長這樣,應(yīng)該是能滿足你需求了

{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d3f', 'details_variables_variable_name': 'CPEVENT', 'details_variables_variable_label': 'CPEVENT', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'factor', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d25', 'details_variables_variable_name': 'CPEVENT2', 'details_variables_variable_label': 'CPEVENT2', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'binary', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d26', 'details_variables_variable_name': 'CP_UNSCHEDULED', 'details_variables_variable_label': 'CP_UNSCHEDULED', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'undefined', 'details_variables_details_r_type_old_value': 'unary', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d02', 'details_variables_variable_name': 'VISIT_NUMBER', 'details_variables_variable_label': 'VISIT_NUMBER', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'integer', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9ccf', 'details_variables_variable_name': 'VISIT_NUMBER2', 'details_variables_variable_label': 'VISIT_NUMBER2', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'binary', 'details_variables_message': 'Variable with different R Type'}

送佛送到西好了

from functools import reduce
import json

import pandas as pd


with open("your dataset file", "r") as fh:
    dic = json.load(fh)

df = pd.DataFrame(reduce(lambda x, y: x + y, (fuck_all(i) for i in dic)))
df.to_csv("out.csv", index=False)

成品

clipboard.png

她愚我 回答

感謝β_3000的啟發(fā),原來好多東西在紙上畫畫就會(huì)清楚很多。

a = arange(12).reshape(3,4)
i = array( [ [0,1],    
             [1,2] ] )
j = array( [ [2,1], 
             [3,3] ] )
print(a[i])             
print(a[i,j])   

a是一個(gè)二維數(shù)組,用二維數(shù)組i索引后竟然變成一個(gè)三維數(shù)組,如圖:
那么a[i,j]就是在a[i]上再按j索引一次?我不知道對不對,至少從結(jié)果上看是對了。
圖片描述

萌二代 回答
$arr=array();
foreach($data as $k=>$v){
    $arr[$v['address']][]=$v;
}
囍槑 回答

python的ide推薦你用pycharm或者sublime text,
你現(xiàn)在出現(xiàn)這種情況應(yīng)該是因?yàn)榄h(huán)境沒有配置好,一般搞python的人不太會(huì)使用vscode來做為ide

離夢 回答

試試這樣可不可以

df1.columns=['aaaaaaa','','','','']
慢半拍 回答

這個(gè)什么鬼?問題估計(jì)處在這里,需要關(guān)閉這個(gè)資源??蓡栴}是你從哪里冒出這個(gè)ImageIO的?

笨尐豬 回答

簡單循環(huán)

最簡單的方式就是循環(huán)拆分一下唄。先上最簡單方法:

import pandas as pd
df = pd.DataFrame({'A':['1','2','3'],'B':['1','2,3','4,5,6'],'C':['3','3','3']})
result = pd.DataFrame(columns=['A','B','C'])
print(df,'\n')
for i in df.itertuples():
    for j in i[2].split(','):
        result = result.append({'A':i[1],'B':j,'C':i[3]},ignore_index=True)        
print(result)

輸出:

   A      B  C
0  1      1  3
1  2    2,3  3
2  3  4,5,6  3 

   A  B  C
0  1  1  3
1  2  2  3
2  2  3  3
3  3  4  3
4  3  5  3
5  3  6  3

更高效的方法

采用expand直接進(jìn)行擴(kuò)展

df = pd.DataFrame({'A':['1','2','3'],'B':['1','2,3','4,5,6'],'C':['3','3','3']})
df = (df.set_index(['A','C'])['B']
       .str.split(',', expand=True)
       .stack()
       .reset_index(level=2, drop=True)
       .reset_index(name='B'))
print(df)
局外人 回答

n = 1、2時(shí)顯然成立

假設(shè)n=m時(shí)成立,則:

$$ 2(\sqrt{m+1} - 1) \le \sum_{k=1}^m \frac{(k-1)!!}{k!!} $$

$$ 2(\sqrt{m} - 1) \le \sum_{k=1}^{m-1} \frac{(k-1)!!}{k!!} $$

$$ 2(\sqrt{m-1} - 1) \le \sum_{k=1}^{m-2} \frac{(k-1)!!}{k!!} $$

當(dāng)n=m+1時(shí):

$$ 左側(cè) = 2(\sqrt{m+2} - 1) $$

$$ 右側(cè) = \sum_{k=1}^{m+1} \frac{(k-1)!!}{k!!} = \sum_{k=1}^m \frac{(k-1)!!}{k!!} + \frac{m!!}{(m+1)!!} $$

因此只要證明下式即可:

$$ \sum_{k=1}^m \frac{(k-1)!!}{k!!} + \frac{m!!}{(m+1)!!} - 2(\sqrt{m+2} - 1) \ge 0 $$

……

接下來就是想辦法證明這個(gè)不等式。但是把

$$ \sum_{k=1}^m \frac{(k-1)!!}{k!!} $$

直接替換成:

$$ 2(\sqrt{m+1} - 1) $$

不行(我之前就是這么做的),會(huì)導(dǎo)致縮放過頭。目前還沒想到證明方法。

另外

$$ \frac{m!!}{(m+1)!!} $$

可以寫成

$$ \frac{(m-2)!!}{(m-1)!!} * \frac{m}{m+1} $$

這個(gè)可能可以用在推導(dǎo)過程中。

浪婳 回答

向 map 這樣的高級遍歷函數(shù),內(nèi)部也是用 for 去循環(huán)的。

尤禮 回答

不要用賦初始值。redux數(shù)據(jù)盡量扁平化。但是需要。stat.key =null。每次用數(shù)據(jù)的時(shí)候用lodash判斷一下isPlainObject或者isArray。

result = dims[c//2::1, c//2::1]
python3 中 //是整除
例如 2/1 #2.0, 2//1 #2

朕略萌 回答

錯(cuò)誤信息不寫清楚了么, Quandl code 不對,是指 YAHOO/INDEX_DJI 這個(gè)?

溫衫 回答

如果是PHP的話,curl 請求的應(yīng)該是個(gè)字符串, 用json_decode( $ret, true) 解析請求的結(jié)果,然后就是數(shù)組了

只愛你 回答

但是很多cookie都是沒用的。要分析有用的cookie。一般有用的cookie都是通過set-cookie頭返回的。沒用的都是隨機(jī)生成的

老梗 回答

真正的端口是頁面加載完用 js 替換的。審查頁面元素有個(gè)加密的 mian.js :

eval(function (p, a, c, k, e, d) { e = function (c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)) }; if (!''.replace(/^/, String)) { while (c--) d[e(c)] = k[c] || e(c); k = [function (e) { return d[e] }]; e = function () { return '\\w+' }; c = 1; }; while (c--) if (k[c]) p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c]); return p; }('$(e(){$(\'\\f\\3\\g\\8\\1\\r\\p\\g\\k\')["\\4\\2\\q\\o"](e(u,h){5 7=$(h);5 j=7["\\i\\2\\1\\2"](\'\\a\\3\');5 9=l["\\3\\2\\8\\d\\4\\m\\b\\1"](7["\\i\\2\\1\\2"](\'\\a\'));5 c=j["\\d\\3\\n\\a\\1"](\'\\f\');t(5 6=0;6<c["\\n\\4\\b\\s\\1\\o"];6++){9-=l["\\3\\2\\8\\d\\4\\m\\b\\1"](c[6])}7["\\1\\4\\k\\1"](9)})})', 31, 31, '|x74|x61|x70|x65|var|d7|ClpoEy3|x72|TO5|x69|x6e|tVF6|x73|function|x2e|x6f|fnDKXroKU2|x64|jgemfCG4|x78|window|x49|x6c|x68|x62|x63|x2d|x67|for|wssP1'.split('|'), 0, {}))

在線解密一下得到:

$(function()
    {
    $('\x2e\x70\x6f\x72\x74\x2d\x62\x6f\x78')["\x65\x61\x63\x68"](function(wssP1,fnDKXroKU2)
        {
        var ClpoEy3=$(fnDKXroKU2);
        var jgemfCG4=ClpoEy3["\x64\x61\x74\x61"]('\x69\x70');
        var TO5=window["\x70\x61\x72\x73\x65\x49\x6e\x74"](ClpoEy3["\x64\x61\x74\x61"]('\x69'));
        var tVF6=jgemfCG4["\x73\x70\x6c\x69\x74"]('\x2e');
        for(var d7=0;
        d7<tVF6["\x6c\x65\x6e\x67\x74\x68"];
        d7++)
            {
            TO5-=window["\x70\x61\x72\x73\x65\x49\x6e\x74"](tVF6[d7])
        }
        ClpoEy3["\x74\x65\x78\x74"](TO5)
    }
    )
}
)

十六進(jìn)制轉(zhuǎn)為字符串之后得到:

$(function() {
    $('.port-box')["each"](function(wssP1, fnDKXroKU2) {
        var ClpoEy3 = $(fnDKXroKU2);
        var jgemfCG4 = ClpoEy3["data"]('ip');
        var TO5 = window["parseInt"](ClpoEy3["data"]('i'));
        var tVF6 = jgemfCG4["split"]('.');
        for (var d7 = 0; d7 < tVF6["length"]; d7++) {
            TO5 -= window["parseInt"](tVF6[d7])
        }
        ClpoEy3["text"](TO5)
    })
})

從代碼可以看出,真實(shí)的端口是 .prot-box 里 data-ip 屬性值 減去 ip 的四位數(shù)之和

離魂曲 回答

商業(yè)網(wǎng)站怎么可能不防爬,這個(gè)是動(dòng)態(tài)加載的,信息都分塊動(dòng)態(tài)加載啦,你F12捉一下xhr的包看一下,我就看到了幾個(gè)包對應(yīng)著school啊,Property timeline for 5/29 Stephenson Street啊,Similar homes in Pialba的信息,都是json數(shù)據(jù)看著辣眼,丟到排版器排一下看看吧

圖片描述

如果你只想要那個(gè)包的話就帶上cookie去請求,幾得把cookie轉(zhuǎn)成字典再丟過去,實(shí)測成功

import requests
cookie = '*********************'
url = 'https://www.realestate.com.au/property/unit-5-29-stephenson-st-pialba-qld-4655'
headers = {'referer': 'https://www.realestate.com.au/property/unit-5-29-stephenson-st-pialba-qld-4655',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36'
}

def trans_cookie(cookie):
    items =cookie.split(';')
    item_dict = {}
    for item in items:
        key = item.split('=')[0].replace(' ', '')
        value = item.split('=')[1]
        item_dict[key] = value
    print(item_dict)
    return item_dict


cookies = trans_cookie(cookie)





r = requests.get(url,cookies=cookies,headers=headers)
with open('gg.txt','w',encoding='utf-8') as f:
    f.write(r.text)

圖片描述

尋仙 回答

SubTurnExport 在http://piccache.cnki.net/kdn/... 里,在你獲取的js里搜索就行了