如何使用bash一個大文件分割成許多小文件?
bash
我有一個文件,說all ,有2000行,我希望它可以被分成4個小文件,行號1?500,501?1000,1001?1500 CodeGo.net,1501?2000。
或許,我可以做
cat all | head -500 >small1
cat all | tail -1500 | head -500 >small2
cat all | tail -1000 | head -500 >small3
cat all | tail -500 >small4
但這種方式涉及到計算行數(shù),這可能錯誤,當(dāng)行數(shù)是不是一個好數(shù)字,或者當(dāng)我們要分割的文件太多的小文件(例如:文件all 與3241線,我們希望把它分割成7個文件,每個都有463線)。
有沒有更好的辦法做到這一點?
本文地址 :CodeGo.net/2693017/
-------------------------------------------------------------------------------------------------------------------------
1.如果要分割的文件,使用split :
split -l 500 all all
將文件拆分成每個具有500線的幾個文件。如果您想將文件分割成4個文件差不多大小的,用這樣的:
split -l $(( $( wc -l < all ) / 4 + 1 )) all all
2.
直視split 命令,它應(yīng)該做你想做的(及以上):
$ split --help
Usage: split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N generate suffixes of length N (default 2)
--additional-suffix=SUFFIX append an additional SUFFIX to file names.
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes[=FROM] use numeric suffixes instead of alphabetic.
FROM changes the start value (default 0).
-e, --elide-empty-files do not generate empty output files with '-n'
--filter=COMMAND write to shell COMMAND; file name is $FILE
-l, --lines=NUMBER put NUMBER lines per output file
-n, --number=CHUNKS generate CHUNKS output files. See below
-u, --unbuffered immediately copy input to output with '-n r/...'
--verbose print a diagnostic just before each
output file is opened
--help display this help and exit
--version output version information and exit
SIZE is an integer and optional unit (example: 10M is 10*1024*1024). Units
are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (powers of 1000).
CHUNKS may be:
N split into N files based on size of input
K/N output Kth of N to stdout
l/N split into N files without splitting lines
l/K/N output Kth of N to stdout without splitting lines
r/N like 'l' but use round robin distribution
r/K/N likewise but only output Kth of N to stdout
3.
像其他人有你split 。所接受的命令替換是沒有必要的。僅供參考,我加入了幾乎什么一直請求。注意-n 命令行來指定夾頭,該數(shù)small* 文件不包含正好500線split 。
$ seq 2000 > all
$ split -n l/4 --numeric-suffixes=1 --suffix-length=1 all small
$ wc -l small*
583 small1
528 small2
445 small3
444 small4
2000 total
另外,您也GNU并行:
$ < all parallel -N500 --pipe --cat cp {} small{#}
$ wc -l small*
500 small1
500 small2
500 small3
500 small4
2000 total
正如你所看到的,這個咒語是GNU的并行實際上是most-的并行pipeline。恕我直言一款值得工具尋找到。
本文標(biāo)題 :如何使用bash一個大文件分割成許多小文件?
本文地址 :CodeGo.net/2693017/
|