小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

<center id="mk6yg"></center>

<samp id="mk6yg"><strong id="mk6yg"></strong></samp>

<dl id="mk6yg"></dl>

<samp id="mk6yg"><optgroup id="mk6yg"></optgroup></samp><samp id="mk6yg"><optgroup id="mk6yg"></optgroup></samp>

<samp id="mk6yg"></samp>

<table id="mk6yg"><xmp id="mk6yg"></xmp></table>

<cite id="mk6yg"></cite>

搜索

分享

QQ空間 QQ好友新浪微博微信

Spark Python API函數(shù)學(xué)習(xí)：pyspark API(1) – 過(guò)往記憶

dazheng 2015-11-05

展開全文

`1`	`# print <span class="wp_keywordlink_affiliate"><a href="http://www./archives/tag/spark" title="" target="_blank" data-original-title="View all posts in Spark">Spark</a></span> version`

`2`	`print("pyspark version:"` `+` `str(sc.version))`

3

`4`	`pyspark version:1.2.2`

map

spark map

01 # map

`02`	`# sc = spark context, parallelize creates an RDD from the passed object`

`03`	`x` `=` `sc.parallelize([1,2,3])`

`04`	`y` `=` `x.map(lambda` `x: (x,x**2))`

05

`06`	`# collect copies RDD elements to a list on the driver`

`07`	`print(x.collect())`

`08`	`print(y.collect())`

09

`10`	`[1,` `2,` `3]`

`11`	`[(1,` `1), (2,` `4), (3,` `9)]`

flatMap

spark flatMap

`1`	`# flatMap`

`2`	`x` `=` `sc.parallelize([1,2,3])`

`3`	`y` `=` `x.flatMap(lambda` `x: (x,` `100x, x*2))`

`4`	`print(x.collect())`

`5`	`print(y.collect())`

6

`7`	`[1,` `2,` `3]`

`8`	`[1,` `100,` `1,` `2,` `200,` `4,` `3,` `300,` `9]`

mapPartitions

spark mapPartitions

`01`	`# mapPartitions`

`02`	`x` `=` `sc.parallelize([1,2,3],` `2)`

`03`	`def` `f(iterator):` `yield` `sum(iterator)`

`04`	`y` `=` `x.mapPartitions(f)`

`05`	`# glom() flattens elements on the same partition`

`06`	`print(x.glom().collect())`

`07`	`print(y.glom().collect())`

08

`09`	`[[1], [2,` `3]]`

`10`	`[[1], [5]]`

mapPartitionsWithIndex

spark mapPartitionsWithIndex

`01`	`# mapPartitionsWithIndex`

`02`	`x` `=` `sc.parallelize([1,2,3],` `2)`

`03`	`def` `f(partitionIndex, iterator):` `yield` `(partitionIndex,sum(iterator))`

`04`	`y` `=` `x.mapPartitionsWithIndex(f)`

05

`06`	`# glom() flattens elements on the same partition`

`07`	`print(x.glom().collect())`

`08`	`print(y.glom().collect())`

09

`10`	`[[1], [2,` `3]]`

`11`	`[[(0,` `1)], [(1,` `5)]]`

getNumPartitions

spark getNumPartitions

`1`	`# getNumPartitions`

`2`	`x` `=` `sc.parallelize([1,2,3],` `2)`

`3`	`y` `=` `x.getNumPartitions()`

`4`	`print(x.glom().collect())`

`5`	`print(y)`

6

`7`	`[[1], [2,` `3]]`

8 2

filter

spark filter

`1`	`# filter`

`2`	`x` `=` `sc.parallelize([1,2,3])`

`3`	`y` `=` `x.filter(lambda` `x: x%2` `==` `1)` `# filters out even elements`

`4`	`print(x.collect())`

`5`	`print(y.collect())`

6

`7`	`[1,` `2,` `3]`

`8`	`[1,` `3]`

distinct

spark distinct

`1`	`# distinct`

`2`	`x` `=` `sc.parallelize(['A','A','B'])`

`3`	`y` `=` `x.distinct()`

`4`	`print(x.collect())`

`5`	`print(y.collect())`

6

`7`	`['A',` `'A',` `'B']`

`8`	`['A',` `'B']`

sample

spark sample

`01`	`# sample`

`02`	`x` `=` `sc.parallelize(range(7))`

`03`	`# call 'sample' 5 times`

`04`	`ylist` `=` `[x.sample(withReplacement=False, fraction=0.5)` `for` `i` `in` `range(5)]`

`05`	`print('x = '` `+` `str(x.collect()))`

`06`	`for` `cnt,y` `in` `zip(range(len(ylist)), ylist):`

`07`	`print('sample:'` `+` `str(cnt)` `+` `' y = '` `+` `str(y.collect()))`

08

`09`	`x` `=` `[0,` `1,` `2,` `3,` `4,` `5,` `6]`

`10`	`sample:0` `y` `=` `[0,` `2,` `5,` `6]`

`11`	`sample:1` `y` `=` `[2,` `6]`

`12`	`sample:2` `y` `=` `[0,` `4,` `5,` `6]`

`13`	`sample:3` `y` `=` `[0,` `2,` `6]`

`14`	`sample:4` `y` `=` `[0,` `3,` `4]`

takeSample

spark takeSample

`01`	`# takeSample`

`02`	`x` `=` `sc.parallelize(range(7))`

`03`	`# call 'sample' 5 times`

`04`	`ylist` `=` `[x.takeSample(withReplacement=False, num=3)` `for` `i` `in` `range(5)]`

`05`	`print('x = '` `+` `str(x.collect()))`

`06`	`for` `cnt,y` `in` `zip(range(len(ylist)), ylist):`

`07`	`print('sample:'` `+` `str(cnt)` `+` `' y = '` `+` `str(y))` `# no collect on y`

08

`09`	`x` `=` `[0,` `1,` `2,` `3,` `4,` `5,` `6]`

`10`	`sample:0` `y` `=` `[0,` `2,` `6]`

`11`	`sample:1` `y` `=` `[6,` `4,` `2]`

`12`	`sample:2` `y` `=` `[2,` `0,` `4]`

`13`	`sample:3` `y` `=` `[5,` `4,` `1]`

`14`	`sample:4` `y` `=` `[3,` `1,` `4]`

union

spark union

01 # union

`02`	`x` `=` `sc.parallelize(['A','A','B'])`

`03`	`y` `=` `sc.parallelize(['D','C','A'])`

`04`	`z` `=` `x.union(y)`

`05`	`print(x.collect())`

`06`	`print(y.collect())`

`07`	`print(z.collect())`

08

`09`	`['A',` `'A',` `'B']`

`10`	`['D',` `'C',` `'A']`

`11`	`['A',` `'A',` `'B',` `'D',` `'C',` `'A']`

intersection

spark intersection

`01`	`# intersection`

`02`	`x` `=` `sc.parallelize(['A','A','B'])`

`03`	`y` `=` `sc.parallelize(['A','C','D'])`

`04`	`z` `=` `x.intersection(y)`

`05`	`print(x.collect())`

`06`	`print(y.collect())`

`07`	`print(z.collect())`

08

`09`	`['A',` `'A',` `'B']`

`10`	`['A',` `'C',` `'D']`

11 ['A']

sortByKey

spark sortByKey

`1`	`# sortByKey`

`2`	`x` `=` `sc.parallelize([('B',1),('A',2),('C',3)])`

`3`	`y` `=` `x.sortByKey()`

`4`	`print(x.collect())`

`5`	`print(y.collect())`

6

`7`	`[('B',` `1), ('A',` `2), ('C',` `3)]`

`8`	`[('A',` `2), ('B',` `1), ('C',` `3)]`

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自： dazheng > 《spark》

舉報(bào)/認(rèn)領(lǐng)

0條評(píng)論

請(qǐng)遵守用戶評(píng)論公約

類似文章 更多

dazheng

關(guān)注對(duì)話

TA的最新館藏

Pandas和Spark DataFrames 的6種不同
邏輯回歸、決策樹和支持向量機(jī)（I）
Query意圖分析：記一次完整的機(jī)器學(xué)習(xí)過(guò)程（scikit learn library學(xué)習(xí)筆記） | 我愛(ài)機(jī)器學(xué)習(xí)
MySQL基準(zhǔn)測(cè)試工具sysbench
初識(shí)聚類算法:K均值、凝聚層次聚類和DBSCAN
Spark Python API函數(shù)學(xué)習(xí)：pyspark API(4) – 過(guò)往記憶

喜歡該文的人也喜歡更多

熱門閱讀換一換

<td id="4eyau"></td>

<li id="4eyau"></li>

<dfn id="4eyau"></dfn>