# apache-spark-tutorial **Repository Path**: friendlyzhang/apache-spark-tutorial ## Basic Information - **Project Name**: apache-spark-tutorial - **Description**: Apache Spark Tutorial.《跟老卫学Apache Spark》 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 3 - **Created**: 2022-05-21 - **Last Updated**: 2022-05-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Apache Spark Tutorial.《跟老卫学Apache Spark开发》 ![](images/spark-logo-trademark.png) *Apache Spark Tutorial*, is a book about how to develop Apache Spark applications. 《跟老卫学Apache Spark开发》是一本 Apache Spark 应用开发的开源学习教程,主要介绍如何从0开始开发 Apache Spark 应用。本书包括最新版本 Apache Spark 3.x 中的新特性。图文并茂,并通过大量实例带你走近 Apache Spark 的世界! 本书业余时间所著,水平有限、时间紧张,难免疏漏,欢迎指正, ## Summary 目录 * [Spark下载、安装](https://developer.huawei.com/consumer/cn/forum/topic/0202568822299090741?fid=23) * [Spark应用初探](https://developer.huawei.com/consumer/cn/forum/topic/0201568823403320732?fid=23) * [Spark累加器LongAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622461925310080?fid=23) * [Spark累加器DoubleAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622590853530085?fid=23) * [Spark累加器CollectionAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622591182960086?fid=23) * [启动Spark应用的方式](https://developer.huawei.com/consumer/cn/forum/topic/0202623507783170122?fid=23) * [Spark广播变量](https://developer.huawei.com/consumer/cn/forum/topic/0202624224916630149?fid=23) * [Spark RDD入门](https://developer.huawei.com/consumer/cn/forum/topic/0201624386890690172?fid=23) * [Spark RDD基本操作](https://developer.huawei.com/consumer/cn/forum/topic/0201627152644060234?fid=23) * [Spark RDD Shuffle操作](https://developer.huawei.com/consumer/cn/forum/topic/0202627152820110215?fid=23) * [深入理解Spark RDD原理](https://developer.huawei.com/consumer/cn/forum/topic/0202628556358740265?fid=23) * [Spark调度管理之资源分配](https://developer.huawei.com/consumer/cn/forum/topic/0202629577348060308?fid=23) * [Spark调度管理之作业调度](https://developer.huawei.com/consumer/cn/forum/topic/0201629622395410333?fid=23) * [Spark SQL概述](https://developer.huawei.com/consumer/cn/forum/topic/0202630480491580330?fid=23) * [Spark SQL之Dataset与DataFrame](https://developer.huawei.com/consumer/cn/forum/topic/0202630480727520331?fid=23) * [Spark SQL之DataFrame入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633012983700432?fid=23) * [Spark SQL之Dataset入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633040938970437?fid=23) * [Spark SQL之基于DataFrame创建临时视图](https://developer.huawei.com/consumer/cn/forum/topic/0202633194774890394?fid=23) * [Spark SQL之RDD转为Dataset](https://developer.huawei.com/consumer/cn/forum/topic/0201633208926640450?fid=23) * [Apache Parquet列式存储格式介绍](https://waylau.com/about-apache-parquet/) * [Spark SQL之Apache Parquet数据源的读取和写入](https://developer.huawei.com/consumer/cn/forum/topic/0202634018676920418?fid=23) * [Apache Hive数据仓库介绍](https://developer.huawei.com/consumer/cn/forum/topic/0201634752549850505?fid=23) * [Spark SQL之使用Apache Hive](https://developer.huawei.com/consumer/cn/forum/topic/0202635471716910045?fid=23) * [Spark SQL之使用JDBC操作数据库](https://developer.huawei.com/consumer/cn/forum/topic/0202635607847820058?fid=23) * [Spark SQL之读取二进制文件](https://developer.huawei.com/consumer/cn/forum/topic/0202635626764400066?fid=23) * [Spark导出数据到CSV文件](https://developer.huawei.com/consumer/cn/forum/topic/0202620883150950010?fid=23) * [Spark Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202636427881730132?fid=23) * [Spark Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201639135765210068?fid=23) * [Spark Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0202639686793340267?fid=23) * [Spark Structured Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202639990757790283?fid=23) * [Spark Structured Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201640617749310121?fid=23) * [Spark Structured Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0201647684921030332?fid=23) * [在Spark中自定义Log4j配置](https://developer.huawei.com/consumer/cn/forum/topic/0201647777007740340?fid=23) * [Spark MLlib机器学习库概述](https://developer.huawei.com/consumer/cn/forum/topic/0201648414415760370?fid=23) * [Spark MLlib之ML Pipeline详解](https://developer.huawei.com/consumer/cn/forum/topic/0202652669139340720?fid=23) * [Spark MLlib之Estimator、Transformer和Param使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0201648630447880382?fid=23) * [Spark MLlib之ML Pipeline使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0202648630694530630?fid=23) * [Spark GraphX图计算处理概述](https://developer.huawei.com/consumer/cn/forum/topic/0202652669536950721?fid=23) * [Spark GraphX图计算示例](https://developer.huawei.com/consumer/cn/forum/topic/0201652741940200499?fid=23) * 未完待续... ## Samples 示例 * [Spark累加器LongAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/LongAccumulatorSample.java) * [Spark累加器DoubleAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/DoubleAccumulatorSample.java) * [Spark累加器CollectionAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/CollectionAccumulatorSample.java) * [SparkLauncher示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/SparkLauncherSample.java) * [InProcessLauncherSample示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/InProcessLauncher.java) * [Broadcast 示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/broadcast/BroadcastSample.java) * [RDD基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicSample.java) * [RDD Transformation和Action基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicOperationSample.java) * [DataFrame基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameBasicExample.java) * [Dataset基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetBasicExample.java) * [基于DataFrame创建临时视图](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameTempViewExample.java) * [RDD转为Dataset](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetSchemaExample.java) * [Apache Parquet数据源的读取和写入](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceParquetExample.java) * [使用Apache Hive](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceHiveExample.java) * [使用JDBC操作数据库](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceJDBCExample.java) * [读取二进制文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceBinaryFile.java) * [Spark导出数据到CSV文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/WriteCVSExample.java) * [Spark Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingSocketSample.java) * [Spark Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingWimdowSample.java) * [Structured Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingSocketSample.java) * [Structured Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingWindowSample.java) * [Estimator、Transformer和Param使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/EstimatorTransformerParamExample.java) * [ML Pipeline使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/PipelineExample.java) * [GraphX图计算示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddGraphXSample.java) * 未完待续... ## Get start 如何开始阅读 选择下面入口之一: * * ## Code 源码 书中所有示例源码,移步至的 `samples` 目录下,代码遵循《[Java 编码规范]()》 ## Issue 意见、建议 如有勘误、意见或建议欢迎拍砖 ## Contact 联系作者 * Blog: [waylau.com](http://waylau.com) * Gmail: [waylau521(at)gmail.com](mailto:waylau521@gmail.com) * Weibo: [waylau521](http://weibo.com/waylau521) * Twitter: [waylau521](https://twitter.com/waylau521) * Github : [waylau](https://github.com/waylau) ## Support Me 请老卫喝一杯 ![开源捐赠](https://waylau.com/images/showmethemoney-sm.jpg)