文件读取 1 df67 = pd.read_excel(r'./data/PE-数据总sjk.xlsx' , skiprows=1 , sheet_name=str (67 ))
数据处理
1 2 df = df.loc[:, ['EXP_BLSBH' ,'ZY' , 'XX' ]]
1 2 3 4 df = df.iloc[1 :3 , 2 :4 ]
1 2 3 4 5 6 7 8 9 10 11 data['xxx' ] = data['xxx' ].str .replace('<' , '' ).str .replace('[' , '' ).str .replace(' ' , '' ) def custom_transform (value ): try : numeric_value = float (value) return '阴性' if numeric_value < 1 else '阳性' if numeric_value >= 1 else str (numeric_value) except ValueError: return value data['xxx' ] = data['xxx' ].apply(custom_transform)
1 data['TT' ] = data['TT' ].fillna('阴性' )
pandas.DataFrame.dropna — pandas 2.1.3 documentation (pydata.org)
1 2 3 4 DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False, ignore_index=False) thresh 与axis=0连用,对行进行筛选时,若有五列为空,则drop subset 对某些特定列进行选择,与how连用,how='all'则表明全nan才drop,how='any'表示存在nan则drop
1 2 3 4 5 >>> df.drop_duplicates(subset=['brand' , 'style' ], keep='last' ) brand style rating 1 Yum Yum cup 4.0 2 Indomie cup 3.5 4 Indomie pack 5.0
文件保存 1 2 3 4 writer = pd.ExcelWriter('./data/s92-s101.xlsx' , mode='a' , engine='openpyxl' , if_sheet_exists='replace' ) dftmp.to_excel(writer, sheet_name='s92' , index=False ) writer.save()