In this article, we’ll implement a spark data source for reading and writing a Google spreadsheet, so that you’ll know how to extend the data source of Spark by yourself.
What’s a customized data source like?
read data from Google Spreadsheet into a DataFrame
1 2 3 4 5
val data = spark.read.format("google-spreadsheet") .option("credentialsPath", credentialFile) .option("spreadsheetId", spreadsheetId) .option("sheetName", sheetName1) .load()
As you all know, we can’t have two classes with the same package and name in one project. So there would be conflicts if different versions of a package are imported. Obviously, in this case, only one version of Guava will be imported.
Apache Kafka is an event streaming platform used to collect, process, store, and integrate data at scale. It has numerous use cases including distributed logging, stream processing, data integration, and pub/sub messaging.
Parquet is a columnar file format that supports nested data. Lots of data systems support this data format because of it’s great advantage of performance.