本文作為一篇原始雛形已經(jīng)過(guò)時(shí),新版本的文章請(qǐng)移步到: 深度剖析Excel表拆分的三項(xiàng)技術(shù)(已實(shí)現(xiàn)純Openpyxl保留全部樣式拆分,自適應(yīng)單文件和多文件拆分等):https://blog.csdn.net/as604049322/article/details/118655016
本文目錄:
透過(guò)本文你能夠?qū)W到:
通過(guò)VBA復(fù)制粘貼全部樣式進(jìn)行單文件表拆分 純Pandas拆分表,無(wú)樣式保留 openpyxl模板法拆分表保留表頭樣式
作者:小小明,高階數(shù)據(jù)處理玩家,幫助各行數(shù)據(jù)從業(yè)者解決各類(lèi)數(shù)據(jù)處理難題。
需求描述
有一個(gè)Excel表格:
我們希望將其按照指定的字段拆分為多個(gè)表格。如果直接用pandas,代碼很簡(jiǎn)單卻只能保留數(shù)據(jù);如果使用openpyxl,也無(wú)法直接設(shè)置原有的樣式,需要逐個(gè)設(shè)置會(huì)非常麻煩。下面我將使用Excel自帶的篩選功能,篩選出指定的值,然后復(fù)制粘貼到一張新的工作表中。唯一值不多的時(shí)候我們?nèi)斯げ僮饕部梢?#xff0c;但數(shù)據(jù)量大唯一值多的時(shí)候,人工操作就耗時(shí)很久了。
如何使用Python實(shí)現(xiàn)這個(gè)自動(dòng)化操作呢?那就是通過(guò)pywin32調(diào)用VBA。
下面我們開(kāi)始操作吧:
實(shí)現(xiàn)過(guò)程
首先,用pywin32打開(kāi)目標(biāo)文件:
import win32com. client as win32 # 導(dǎo)入模塊
import os
excel_app = win32. gencache. EnsureDispatch( 'Excel.Application' )
filename = "數(shù)據(jù)源.xlsx"
filename = os. path. abspath( filename)
wb = excel_app. Workbooks. Open( filename)
sheet = wb. ActiveSheet
max_rows = sheet. UsedRange. Rows. Count
max_cols = sheet. UsedRange. Columns. Count
max_rows, max_cols
(3216, 9)
可以看到源數(shù)據(jù)有3216行,9列。
獲取數(shù)據(jù)范圍,并設(shè)置自動(dòng)列寬調(diào)整:
rng = sheet. Range( sheet. Cells( 1 , 1 ) , sheet. Cells( max_rows, max_cols) )
# 設(shè)置自動(dòng)列寬
rng. EntireColumn. AutoFit( )
設(shè)置后的效果:
構(gòu)建一個(gè)拆分函數(shù):
def split_excel ( num) :
"""num示被拆分的列號(hào)"""
names = set ( sheet. Range( sheet. Cells( 2 , num) , sheet. Cells( max_rows, num) ) . Value)
# 禁用自動(dòng)更新加快執(zhí)行速度
excel_app. ScreenUpdating = False
for name, in names:
sheet. Activate( )
rng. AutoFilter( Field= num, Criteria1= name)
rng. Select( )
excel_app. Selection. Copy( )
new_sheet = excel_app. Sheets. Add( After= wb. Worksheets( wb. Worksheets. Count) )
new_sheet. Name = name
new_sheet. Range( "A1" ) . Activate( )
new_sheet. Paste( )
new_sheet. Range( new_sheet. Cells( 1 , 1 ) , new_sheet. Cells( 1 , max_cols) ) . EntireColumn. AutoFit( )
# 恢復(fù)自動(dòng)更新
excel_app. ScreenUpdating = True
該函數(shù)涉及的方法很多,需要反復(fù)查詢(xún)VBA文檔并測(cè)試才能寫(xiě)出,不過(guò)前人栽樹(shù)后人乘涼,我已經(jīng)為大家寫(xiě)出來(lái)啦,可以直接使用。當(dāng)然也歡迎VBA大佬對(duì)本方法進(jìn)行升級(jí)改造。
一些重點(diǎn)的API:
Range 對(duì)象:https://docs.microsoft.com/zh-cn/office/vba/api/excel.range(object)
Range.AutoFilter:https://docs.microsoft.com/zh-cn/office/vba/api/excel.range.autofilter
Sheets.Add :https://docs.microsoft.com/zh-cn/office/vba/api/excel.sheets.add
Worksheet.Name 屬性:https://docs.microsoft.com/zh-cn/office/vba/api/excel.worksheet.name
Application.ScreenUpdating:https://docs.microsoft.com/zh-cn/office/vba/api/excel.application.screenupdating
其他需要注意的點(diǎn):
在Excel本身的VBA環(huán)境,獲取唯一值,我們往往需要使用高級(jí)篩選或字典對(duì)象。VBA的字典對(duì)象使用起來(lái)較為麻煩,文檔地址:https://docs.microsoft.com/zh-cn/office/vba/language/reference/user-interface-help/dictionary-object
但我們?cè)貾ython環(huán)境中使用VBA,則無(wú)需使用VBA的數(shù)組或字典對(duì)象,使用python本身的對(duì)象操作即可。
下面我們對(duì)區(qū)域列(第2列)進(jìn)行拆分:
split_excel( 2 )
可以看到拆分的結(jié)果,完全保留了原有的樣式。
最后我們保存文件即可:
wb. SaveAs( os. path. abspath( "result.xlsx" ) )
直接修改原有文件直接調(diào)用wb.Save()即可,上述命令表示另存為。
可以關(guān)閉工作簿:
wb. Close( )
還可以關(guān)閉Excel軟件:
excel_app. Quit( )
純VBA實(shí)現(xiàn)
為了沒(méi)有安裝python的童鞋使用方便,將以上過(guò)程封裝成純vba代碼,可以直接在Excel軟件中使用:
Sub 帶格式分列()
Application.ScreenUpdating = False
Set Sh = ActiveSheet
max_rows = Sh.UsedRange.Rows.Count
max_cols = Sh.UsedRange.Columns.Count
Set Rng = Sh.Range(Sh.Cells(1, 1), Sh.Cells(max_rows, max_cols))
Rng.EntireColumn.AutoFit
'Col為要手動(dòng)輸入要拆分的列序數(shù)
Col = CInt(InputBox("輸入用于分組的列序號(hào)!"))
Range(Cells(2, Col), Cells(max_rows, Col)).AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Cells(1, max_cols + 2), Unique:=True
LastRow = Cells(1, max_cols + 2).End(xlDown).Row
Range(Cells(1, max_cols + 2), Cells(LastRow, max_cols + 2)).RemoveDuplicates Columns:=1, Header:=xlNo
LastRow = Cells(1, max_cols + 2).End(xlDown).Row
For i = 1 To LastRow
Name = CStr(Sh.Cells(i, max_cols + 2))
Sh.Activate
Rng.AutoFilter Field:=Col, Criteria1:=Name
Rng.Copy
Set new_sheet = Sheets.Add(After:=Sheets(Sheets.Count))
new_sheet.Name = Name
new_sheet.Range("A1").Activate
new_sheet.Paste
new_sheet.Range(new_sheet.Cells(1, 1), new_sheet.Cells(1, max_cols)).EntireColumn.AutoFit
Next
Sh.Activate
Columns(max_cols + 2).Delete Shift:=xlToLeft
Selection.AutoFilter
Application.ScreenUpdating = True
End Sub
升級(jí):能指定起始行的帶格式拆分
后面碰過(guò)了起始行不在開(kāi)頭的需求:
對(duì)于這類(lèi)需求會(huì)增加復(fù)制非篩選區(qū)域的操作,我已經(jīng)完整封裝了全部過(guò)程到一個(gè)方法。
完整代碼如下:
import win32com. client as win32 # 導(dǎo)入模塊
import os
excel_app = win32. gencache. EnsureDispatch( 'Excel.Application' )
def split_excel ( filename, save_name, num, title_row= 1 ) :
"""作者小小明的csdn:https://blog.csdn.net/as604049322"""
wb = excel_app. Workbooks. Open( os. path. abspath( filename) )
try :
sheet = wb. ActiveSheet
max_rows = sheet. UsedRange. Rows. Count
max_cols = sheet. UsedRange. Columns. Count
if title_row > 1 :
start = sheet. Range( sheet. Cells(
1 , 1 ) , sheet. Cells( title_row- 1 , max_cols) )
rng = sheet. Range( sheet. Cells( title_row, 1 ) ,
sheet. Cells( max_rows, max_cols) )
# 設(shè)置自動(dòng)列寬
rng. EntireColumn. AutoFit( )
names = set ( sheet. Range( sheet. Cells( title_row+ 1 , num) ,
sheet. Cells( max_rows, num) ) . Value)
for name, in names:
if not name:
continue
new_sheet = excel_app. Sheets. Add(
After= wb. Worksheets( wb. Worksheets. Count) )
new_sheet. Name = name
if title_row > 1 :
sheet. Activate( )
start. Copy( )
new_sheet. Activate( )
new_sheet. Range( "A1" ) . Activate( )
new_sheet. Paste( )
sheet. Activate( )
rng. AutoFilter( Field= num, Criteria1= name)
rng. Copy( )
new_sheet. Activate( )
new_sheet. Range( f"A { title_row} " ) . Activate( )
new_sheet. Paste( )
new_sheet. Range( new_sheet. Cells( 1 , 1 ) , new_sheet. Cells(
1 , max_cols) ) . EntireColumn. AutoFit( )
wb. SaveAs( os. path. abspath( save_name) )
finally :
wb. Close( )
split_excel( "工單.xlsx" , '拆分結(jié)果.xlsx' , 4 , 2 )
#這步會(huì)關(guān)閉你正在使用的Excel軟件,視具體情況決定是否要注釋掉
excel_app. Quit( )
VBA代碼翻譯成Python調(diào)用示例
前面我演示了Python帶格式拆分Excel表,可能大家對(duì)使用python來(lái)調(diào)用vba還比較生疏,下面我將演示將一段vba代碼翻譯為python調(diào)用。
下面這段拆分Excel表的vba代碼來(lái)自才哥的文章《Python對(duì)比VBA實(shí)現(xiàn)excel表格合并與拆分 》,作者是“兩百斤的老濤 ”,一起看看吧:
Sub 表格拆分()
'屏幕刷新=false
Application.ScreenUpdating = False
Dim LastRow, LastCol As Long
Dim Sh, Sht As Worksheet
'Sh指代當(dāng)前活動(dòng)頁(yè)
Set Sh = ActiveSheet
'當(dāng)前活動(dòng)頁(yè)的最后一行
LastRow = Sh.Cells(Rows.Count, 1).End(xlUp).Row
'當(dāng)前活動(dòng)頁(yè)的最后一列
LastCol = Sh.Cells(1, Columns.Count).End(xlToLeft).Column
'定義D為字典
Dim D As Object
Set D = CreateObject("Scripting.Dictionary")
Dim Col As Integer
'Col為要手動(dòng)輸入要拆分的列序數(shù)
Col = InputBox("輸入用于分組的列序號(hào)!")
'從第2行找到最后一行
For i = 2 To LastRow
'查找這個(gè)要拆分行,看它在不在字典里
TempStr = CStr(Sh.Cells(i, Col))
'如果在字典里
If D.exists(TempStr) Then
'將數(shù)據(jù)放到對(duì)應(yīng)的頁(yè)里
Set Sht = Worksheets(TempStr)
'字典key值對(duì)應(yīng)的項(xiàng)目值記錄該頁(yè)當(dāng)前內(nèi)容添加的行數(shù),每次+1
D(TempStr) = D(TempStr) + 1
'下面一行可以注釋掉了跟下面的重復(fù)了……
'Sht.Cells(D(TempStr), 1) = Sh.Cells(i, 1)
For j = 1 To LastCol
Sht.Cells(D(TempStr), j) = Sh.Cells(i, j)
Next
Else
'如果不在字典里,就添加一個(gè)新key
D.Add TempStr, 1
'i = i - 1是讓該行一會(huì)兒重新檢索一遍就能進(jìn)到if里了
i = i - 1
'在最后一頁(yè)新加一頁(yè),頁(yè)名就是TempStr
Sheets.Add After:=Sheets(Sheets.Count)
Sheets(Sheets.Count).Name = TempStr
'下面一行也是可以注釋掉的
'Sheets(Sheets.Count).Cells(1, 1) = Sh.Cells(1, 1)
'把第一行標(biāo)題行弄過(guò)去
For j = 1 To LastCol
Sheets(Sheets.Count).Cells(1, j) = Sh.Cells(1, j)
Next
End If
Next
'激活初始頁(yè),視覺(jué)上保持不變
Sh.Activate
'RT,GDCDSZ
MsgBox ("完成!")
End Sub
下面我們將其轉(zhuǎn)換為python代碼來(lái)調(diào)用:
建立在已經(jīng)打開(kāi)Excel文件的前提下:
import win32com. client as win32 # 導(dǎo)入模塊
import os
excel_app = win32. gencache. EnsureDispatch( 'Excel.Application' )
filename = "數(shù)據(jù)源.xlsx"
filename = os. path. abspath( filename)
wb = excel_app. Workbooks. Open( filename)
Set Sh = ActiveSheet
等價(jià)于:
Sh = wb. ActiveSheet
對(duì)于下面這兩行代碼:
'當(dāng)前活動(dòng)頁(yè)的最后一行
LastRow = Sh.Cells(Rows.Count, 1).End(xlUp).Row
'當(dāng)前活動(dòng)頁(yè)的最后一列
LastCol = Sh.Cells(1, Columns.Count).End(xlToLeft).Column
首先對(duì)于Rows和Columns可以通過(guò)頂級(jí)的’Excel.Application’對(duì)象來(lái)引用,而xlUp和xlToLeft兩個(gè)常量值,我目前采用的方案是通過(guò)文檔進(jìn)行查閱,首先定位到vba文檔的Range.End 屬性,然后再點(diǎn)擊 Direction 參數(shù)的數(shù)據(jù)類(lèi)型:https://docs.microsoft.com/zh-cn/office/vba/api/excel.xldirection
于是我們翻譯為:
LastRow = Sh. Cells( excel_app. Rows. Count, 1 ) . End( - 4162 ) . Row
LastCol = Sh. Cells( 1 , excel_app. Columns. Count) . End( - 4159 ) . Column
專(zhuān)業(yè)的vba程序員都習(xí)慣用上面的方法獲取數(shù)據(jù)的行數(shù)和列數(shù),但一般情況下用我前面的UsedRange的方法就夠了。
由于數(shù)據(jù)都直接讀取到python環(huán)境中,我們直接使用python的字典,繼續(xù)翻譯剩下的循環(huán)部分:
D = { }
Col = 2
excel_app. ScreenUpdating = False
for i in range ( 2 , LastRow+ 1 ) :
TempStr = Sh. Cells( i, Col) . Value
if TempStr in D:
Sht = wb. Sheets( TempStr)
D[ TempStr] += 1
for j in range ( 1 , LastCol+ 1 ) :
Sht. Cells( D[ TempStr] , j) . Value = Sh. Cells( i, j) . Value
else :
D[ TempStr] = 1
excel_app. Sheets. Add( After= wb. Sheets( wb. Sheets. Count) )
wb. Sheets( wb. Sheets. Count) . Name = TempStr
for j in range ( 1 , LastCol+ 1 ) :
wb. Sheets( wb. Sheets. Count) . Cells( 1 , j) . Value = Sh. Cells( 1 , j) . Value
Sh. Activate( )
excel_app. ScreenUpdating = True
我再按照個(gè)人的習(xí)慣重新編寫(xiě)一下:
rows_dict = { }
Col = 2
excel_app. ScreenUpdating = False
for i in range ( 2 , LastRow+ 1 ) :
k = Sh. Cells( i, Col) . Value
if k not in rows_dict:
Sht = excel_app. Sheets. Add( After= wb. Sheets( wb. Sheets. Count) )
Sht. Name = k
Sht. Range( Sht. Cells( 1 , 1 ) , Sht. Cells( 1 , LastCol) ) . Value = Sh. Range(
Sh. Cells( 1 , 1 ) , Sh. Cells( 1 , LastCol) ) . Value
rows_dict[ k] = 1
else :
Sht = wb. Sheets( k)
rows_dict[ k] += 1
Sht. Range( Sht. Cells( rows_dict[ k] , 1 ) , Sht. Cells(
rows_dict[ k] , LastCol) ) . Value = Sh. Range( Sh. Cells( i, 1 ) , Sh. Cells( i, LastCol) ) . Value
Sh. Activate( )
excel_app. ScreenUpdating = True
最終完整代碼:
import win32com. client as win32 # 導(dǎo)入模塊
import os
excel_app = win32. gencache. EnsureDispatch( 'Excel.Application' )
filename = "數(shù)據(jù)源.xlsx"
filename = os. path. abspath( filename)
wb = excel_app. Workbooks. Open( filename)
Sh = wb. ActiveSheet
LastRow = Sh. Cells( excel_app. Rows. Count, 1 ) . End( - 4162 ) . Row
LastCol = Sh. Cells( 1 , excel_app. Columns. Count) . End( - 4159 ) . Column
rows_dict = { }
Col = 2
excel_app. ScreenUpdating = False
for i in range ( 2 , LastRow+ 1 ) :
k = Sh. Cells( i, Col) . Value
if k not in rows_dict:
Sht = excel_app. Sheets. Add( After= wb. Sheets( wb. Sheets. Count) )
Sht. Name = k
Sht. Range( Sht. Cells( 1 , 1 ) , Sht. Cells( 1 , LastCol) ) . Value = Sh. Range(
Sh. Cells( 1 , 1 ) , Sh. Cells( 1 , LastCol) ) . Value
rows_dict[ k] = 1
else :
Sht = wb. Sheets( k)
rows_dict[ k] += 1
Sht. Range( Sht. Cells( rows_dict[ k] , 1 ) , Sht. Cells(
rows_dict[ k] , LastCol) ) . Value = Sh. Range( Sh. Cells( i, 1 ) , Sh. Cells( i, LastCol) ) . Value
Sh. Activate( )
excel_app. ScreenUpdating = True
wb. SaveAs( os. path. abspath( "result.xlsx" ) )
wb. Close( )
excel_app. Quit( )
經(jīng)測(cè)試,原始vba代碼在Excel環(huán)境中 運(yùn)行耗時(shí)1秒以?xún)?nèi),但運(yùn)行以上python代碼,耗時(shí)接近30秒。
這是因?yàn)?#xff0c;python通過(guò)vba讀取Excel數(shù)據(jù)時(shí),存在很頻繁的交互,同時(shí)也說(shuō)明并不是任何vba代碼都適合用python來(lái)調(diào)用。對(duì)于大部分?jǐn)?shù)據(jù)讀寫(xiě)操作,用python自帶的庫(kù)會(huì)便捷很多,速度也會(huì)比vba快。對(duì)于樣式復(fù)雜粘貼使用vba則極度方便。
使用Pandas實(shí)現(xiàn)Excel拆分
上述vba代碼實(shí)際上僅僅只是實(shí)現(xiàn)不帶樣式的拆分,對(duì)于這樣的需求,其實(shí)用Pandas會(huì)非常簡(jiǎn)單:
from openpyxl import load_workbook
import pandas as pd
df = pd. read_excel( "數(shù)據(jù)源.xlsx" )
with pd. ExcelWriter( 'result.xlsx' , engine= 'openpyxl' ) as writer:
writer. book = load_workbook( "數(shù)據(jù)源.xlsx" )
for area, df_split in df. groupby( "區(qū)域" ) :
df_split. to_excel( writer, area, index= False )
缺點(diǎn)是日期沒(méi)有保留原有的文本格式:
不過(guò)我們可以指定日期的格式:
from openpyxl import load_workbook
import pandas as pd
df = pd. read_excel( "數(shù)據(jù)源.xlsx" )
with pd. ExcelWriter( 'result.xlsx' , engine= 'openpyxl' , datetime_format= 'YYYY/MM/DD' ) as writer:
writer. book = load_workbook( "數(shù)據(jù)源.xlsx" )
for area, df_split in df. groupby( "區(qū)域" ) :
df_split. to_excel( writer, area, index= False )
使用了openpyxl還可以逐個(gè)單元格copy樣式信息,相對(duì)來(lái)說(shuō)會(huì)麻煩一些,也并不是所有樣式都能復(fù)制。
不過(guò)但如果我們只需要保留表頭樣式拆分Excel表,可以通過(guò)openpyxl制作模板并加載模板,下面看看具體實(shí)現(xiàn):
使用openpyxl保留表頭樣式拆分Excel表
我們的實(shí)現(xiàn)目標(biāo)依然是:
其實(shí)這種需求,除了表頭樣式以外并不需要關(guān)心下面的數(shù)據(jù)的樣式。這時(shí)使用openpyxl才是最簡(jiǎn)單的,下面我們看看操作流程。
首先,我們讀取數(shù)據(jù)并分組:
from openpyxl import load_workbook
num = 4
title_row = 2
filename = "工單.xlsx"
book = load_workbook( filename)
sheet = book. active
# 讀取并分組相應(yīng)的數(shù)據(jù)
data = { }
for row in sheet. iter_rows( min_row= title_row+ 1 ) :
row = [ cell. value for cell in row]
data_split = data. setdefault( row[ num- 1 ] , [ ] )
data_split. append( row)
然后遍歷每組創(chuàng)建模板后寫(xiě)入對(duì)應(yīng)數(shù)據(jù):
for name, data_split in data. items( ) :
new_sheet = book. copy_worksheet( sheet)
new_sheet. title = name
# 刪除標(biāo)題行以外的數(shù)據(jù)作為模板
new_sheet. delete_rows( title_row+ 1 , sheet. max_row)
for row in data_split:
new_sheet. append( row)
book. save( "拆分結(jié)果.xlsx" )
是不是非常簡(jiǎn)單?下面我們可以封裝起來(lái):
from openpyxl import load_workbook
def split_excel ( filename, save_name, num, title_row= 1 ) :
"""小小明的CSDN:https://blog.csdn.net/as604049322"""
book = load_workbook( filename)
sheet = book. active
# 讀取并分組相應(yīng)的數(shù)據(jù)
data = { }
for row in sheet. iter_rows( min_row= title_row+ 1 ) :
row = [ cell. value for cell in row]
data_split = data. setdefault( row[ num- 1 ] , [ ] )
data_split. append( row)
for name, data_split in data. items( ) :
new_sheet = book. copy_worksheet( sheet)
new_sheet. title = name
# 刪除標(biāo)題行以外的數(shù)據(jù)作為模板
new_sheet. delete_rows( title_row+ 1 , sheet. max_row)
for row in data_split:
new_sheet. append( row)
book. save( save_name)
split_excel( "工單.xlsx" , '拆分結(jié)果.xlsx' , 4 , 2 )
但是使用openpyxl拆分也有較大缺陷,例如數(shù)據(jù)中存在日期格式時(shí):
split_excel( "數(shù)據(jù)源.xlsx" , '拆分結(jié)果2.xlsx' , 2 , 1 )
日期格式自定義起來(lái)會(huì)比較麻煩,難以通用化,列寬需要手工自適應(yīng)(這個(gè)我在《Pandas指定樣式保存excel數(shù)據(jù)的N種姿勢(shì) 》一文中已經(jīng)實(shí)現(xiàn)了pandas自適應(yīng)調(diào)整)。
總結(jié)
本文演示了通過(guò)復(fù)制粘貼篩選結(jié)果實(shí)現(xiàn)保留格式拆分表格的方法,并分別通過(guò)python調(diào)用和純vba實(shí)現(xiàn)。作為一種一種拋磚引玉的做法并不能應(yīng)對(duì)所有的需求,對(duì)于表頭涉及多行合并單元格的需求還需各位童鞋發(fā)揮自己的腦洞,針對(duì)性解決相應(yīng)的問(wèn)題。
通過(guò)上述代碼的樣式詳細(xì)大家都能看到,對(duì)于樣式拷貝,使用vba很簡(jiǎn)單;對(duì)于數(shù)據(jù)處理,使用Pandas很簡(jiǎn)單;僅僅只拷貝表頭樣式,使用openpyxl最簡(jiǎn)單,但對(duì)于日期和列寬需要特殊處理。