我的 json 格式如下:
[
{
"status": "changed",
"dataset": {
"id": "5a4b463c855d783af4f5f695",
"name": "AE_E",
"label": "1- ADVERSE EVENTS - Not Analyzed"
},
"details": {
"variables": [
{
"variable": {
"id": "5a4b4647855d783b494f9d3f",
"name": "CPEVENT",
"label": "CPEVENT"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "unary",
"old_value": "factor"
}
},
"message": "Variable with different R Type"
},
{
"variable": {
"id": "5a4b4647855d783b494f9d25",
"name": "CPEVENT2",
"label": "CPEVENT2"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "unary",
"old_value": "binary"
}
},
"message": "Variable with different R Type"
},
{
"variable": {
"id": "5a4b4647855d783b494f9d26",
"name": "CP_UNSCHEDULED",
"label": "CP_UNSCHEDULED"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "undefined",
"old_value": "unary"
}
},
"message": "Variable with different R Type"
},
{
"variable": {
"id": "5a4b4647855d783b494f9d02",
"name": "VISIT_NUMBER",
"label": "VISIT_NUMBER"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "unary",
"old_value": "integer"
}
},
"message": "Variable with different R Type"
},
{
"variable": {
"id": "5a4b4647855d783b494f9ccf",
"name": "VISIT_NUMBER2",
"label": "VISIT_NUMBER2"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "unary",
"old_value": "binary"
}
},
"message": "Variable with different R Type"
}
],
"many_visits": null
}
},
{
"status": "changed",
"dataset": {
"id": "5a4b465b855d783af4f5f737",
"name": "AE_EQG2",
"label": "2 - ADVERSE EVENTS- Not Analyzed"
},
"details": {
"variables": [
{
"variable": {
"id": "5a4b4666855d783b4b5175ce",
"name": "ADVE_MEDDRA_SOC",
"label": "SYSTEM ORGAN CLASS"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "character",
"old_value": "factor"
}
},
"message": "Variable with different R Type"
}
],
"many_visits": null
}
},
{
"status": "changed",
"dataset": {
"id": "5a4b467a855d783af4f5f7d7",
"name": "AE_M",
"label": "3- ADVERSE EVENTS MEDICATION ERROR - Not Analyzed"
},
"details": {
"variables": [
{
"variable": {
"id": "5a4b4682855d783b494f9dad",
"name": "ADVE_MEDDRA_PT",
"label": "PREFERRED TERM -PT-"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "character",
"old_value": "factor"
}
},
"message": "Variable with different R Type"
},
{
"variable": {
"id": "5a4b4682855d783b494f9d90",
"name": "ADVE_MEDDRA_PT_CODE",
"label": "PREFERRED TERM -PT- CODE"
},
"status": "changed",
"details": {
"r_type": {
"new_value": "character",
"old_value": "factor"
}
},
"message": "Variable with different R Type"
}
],
"many_visits": null
}
},
{
"status": "unchanged",
"dataset": {
"id": "5a4b468c855d783af4f5f839",
"name": "AGG_AE_E",
"label": "1.1 - ADVERSE EVENTS- Aggregated by patient"
},
"details": null
},
{
"status": "unchanged",
"dataset": {
"id": "5a4b469a855d783af4f5f8db",
"name": "AGG_AE_M",
"label": "3.2- ADVERSE EVENTS MEDICATION ERROR- Aggregated by patient"
},
"details": null
}]
我的code如下:
import collections
import pandas as pd
import json
def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
is_lst = True if isinstance(v, list) else False
if isinstance(v, collections.MutableMapping) or is_lst:
if is_lst:
items.extend(flatten(v[0], new_key, sep=sep).items())
else:
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
with open("reuse.txt") as f:
dic = json.load(f)
df_ = [flatten(i) for i in dic]
df =pd.DataFrame(df_)
df.to_csv('out.csv', index=False)
感謝Self幫我修改了之前的一段code,
可是目前的code只能解析一半的數(shù)據(jù),另外一半數(shù)據(jù)被截?cái)嗔?。比如AE_E這個(gè)數(shù)據(jù)集,應(yīng)該有很多變量的信息被放到csv里面,但是現(xiàn)在只有第一個(gè)變量的信息被解析出來了。
我希望得到的結(jié)果是這樣:
目前能得到的結(jié)果是這樣的:
奇怪的結(jié)果:
AE_E只有第一個(gè)變量CPEVENT的結(jié)果被解析出來了,其他CPEVENT2等變量的數(shù)據(jù)都丟失了。
注意:python3以后才支持yield from
語法
import collections
def flatten(d, prefix="", sep="_"):
def _take_prefix(k, v, p):
if p:
yield from flatten(v, "{}{}{}".format(p, sep, k))
else:
yield from flatten(v, str(k))
if isinstance(d, dict):
for k, v in d.items():
if isinstance(v, str) or not isinstance(v, collections.Iterable):
if prefix:
yield "{}{}{}".format(prefix, sep, k), v
else:
yield k, v
elif isinstance(v, dict):
yield from _take_prefix(k, v, prefix)
elif isinstance(v, list):
for i in v:
yield from _take_prefix(k, i, prefix)
else:
pass
else:
pass
dic = {your dataset}
for key, value in flatten(dic):
print("{}: {}".format(key, value))
結(jié)果如下,應(yīng)該能拍平了
status: changed
dataset_id: 5a4b463c855d783af4f5f695
dataset_name: AE_E
dataset_label: 1- ADVERSE EVENTS - Not Analyzed
details_variables_variable_id: 5a4b4647855d783b494f9d3f
details_variables_variable_name: CPEVENT
details_variables_variable_label: CPEVENT
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: factor
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9d25
details_variables_variable_name: CPEVENT2
details_variables_variable_label: CPEVENT2
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: binary
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9d26
details_variables_variable_name: CP_UNSCHEDULED
details_variables_variable_label: CP_UNSCHEDULED
details_variables_status: changed
details_variables_details_r_type_new_value: undefined
details_variables_details_r_type_old_value: unary
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9d02
details_variables_variable_name: VISIT_NUMBER
details_variables_variable_label: VISIT_NUMBER
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: integer
details_variables_message: Variable with different R Type
details_variables_variable_id: 5a4b4647855d783b494f9ccf
details_variables_variable_name: VISIT_NUMBER2
details_variables_variable_label: VISIT_NUMBER2
details_variables_status: changed
details_variables_details_r_type_new_value: unary
details_variables_details_r_type_old_value: binary
details_variables_message: Variable with different R Type
details_many_visits: None
針對(duì)你修改后的問題, 再加個(gè)函數(shù)就搞定:
# 這個(gè)fuck_all函數(shù)比較特例, 完全是針對(duì)你要區(qū)分的dataset下面的N個(gè)變量信息這種需求
def fuck_all(dic, prefix="details_variables"):
lst = list(flatten(dic)) # flatten函數(shù)則比較通用,任何嵌套數(shù)據(jù)集都可以用它拍平
lines = []
top = {k: v for k, v in lst if not k.startswith(prefix)}
index = 0
for key, value in lst:
if not key.startswith(prefix):
continue
else:
if not lines:
lines.append(top.copy())
if key in lines[index].keys():
index += 1
lines.append(top.copy())
lines[index][key] = value
return lines
d = {your dataset}
for i in fuck_all(d):
print(i)
結(jié)果長(zhǎng)這樣,應(yīng)該是能滿足你需求了
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d3f', 'details_variables_variable_name': 'CPEVENT', 'details_variables_variable_label': 'CPEVENT', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'factor', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d25', 'details_variables_variable_name': 'CPEVENT2', 'details_variables_variable_label': 'CPEVENT2', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'binary', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d26', 'details_variables_variable_name': 'CP_UNSCHEDULED', 'details_variables_variable_label': 'CP_UNSCHEDULED', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'undefined', 'details_variables_details_r_type_old_value': 'unary', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9d02', 'details_variables_variable_name': 'VISIT_NUMBER', 'details_variables_variable_label': 'VISIT_NUMBER', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'integer', 'details_variables_message': 'Variable with different R Type'}
{'status': 'changed', 'dataset_id': '5a4b463c855d783af4f5f695', 'dataset_name': 'AE_E', 'dataset_label': '1- ADVERSE EVENTS - Not Analyzed', 'details_many_visits': None, 'details_variables_variable_id': '5a4b4647855d783b494f9ccf', 'details_variables_variable_name': 'VISIT_NUMBER2', 'details_variables_variable_label': 'VISIT_NUMBER2', 'details_variables_status': 'changed', 'details_variables_details_r_type_new_value': 'unary', 'details_variables_details_r_type_old_value': 'binary', 'details_variables_message': 'Variable with different R Type'}
送佛送到西好了
from functools import reduce
import json
import pandas as pd
with open("your dataset file", "r") as fh:
dic = json.load(fh)
df = pd.DataFrame(reduce(lambda x, y: x + y, (fuck_all(i) for i in dic)))
df.to_csv("out.csv", index=False)
成品
試試看是否符合你的需求:
def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
is_lst = True if isinstance(v, list) else False
if isinstance(v, collections.MutableMapping) or is_lst:
if is_lst:
items.extend(flatten(v[0], new_key, sep=sep).items())
else:
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
北大青鳥APTECH成立于1999年。依托北京大學(xué)優(yōu)質(zhì)雄厚的教育資源和背景,秉承“教育改變生活”的發(fā)展理念,致力于培養(yǎng)中國IT技能型緊缺人才,是大數(shù)據(jù)專業(yè)的國家
北大青鳥中博軟件學(xué)院創(chuàng)立于2003年,作為華東區(qū)著名互聯(lián)網(wǎng)學(xué)院和江蘇省首批服務(wù)外包人才培訓(xùn)基地,中博成功培育了近30000名軟件工程師走向高薪崗位,合作企業(yè)超4
中公教育集團(tuán)創(chuàng)建于1999年,經(jīng)過二十年潛心發(fā)展,已由一家北大畢業(yè)生自主創(chuàng)業(yè)的信息技術(shù)與教育服務(wù)機(jī)構(gòu),發(fā)展為教育服務(wù)業(yè)的綜合性企業(yè)集團(tuán),成為集合面授教學(xué)培訓(xùn)、網(wǎng)
達(dá)內(nèi)教育集團(tuán)成立于2002年,是一家由留學(xué)海歸創(chuàng)辦的高端職業(yè)教育培訓(xùn)機(jī)構(gòu),是中國一站式人才培養(yǎng)平臺(tái)、一站式人才輸送平臺(tái)。2014年4月3日在美國成功上市,融資1
曾工作于聯(lián)想擔(dān)任系統(tǒng)開發(fā)工程師,曾在博彥科技股份有限公司擔(dān)任項(xiàng)目經(jīng)理從事移動(dòng)互聯(lián)網(wǎng)管理及研發(fā)工作,曾創(chuàng)辦藍(lán)懿科技有限責(zé)任公司從事總經(jīng)理職務(wù)負(fù)責(zé)iOS教學(xué)及管理工作。
浪潮集團(tuán)項(xiàng)目經(jīng)理。精通Java與.NET 技術(shù), 熟練的跨平臺(tái)面向?qū)ο箝_發(fā)經(jīng)驗(yàn),技術(shù)功底深厚。 授課風(fēng)格 授課風(fēng)格清新自然、條理清晰、主次分明、重點(diǎn)難點(diǎn)突出、引人入勝。
精通HTML5和CSS3;Javascript及主流js庫,具有快速界面開發(fā)的能力,對(duì)瀏覽器兼容性、前端性能優(yōu)化等有深入理解。精通網(wǎng)頁制作和網(wǎng)頁游戲開發(fā)。
具有10 年的Java 企業(yè)應(yīng)用開發(fā)經(jīng)驗(yàn)。曾經(jīng)歷任德國Software AG 技術(shù)顧問,美國Dachieve 系統(tǒng)架構(gòu)師,美國AngelEngineers Inc. 系統(tǒng)架構(gòu)師。