spark-playground

📚 Learning and exploring Apache Spark.

Standalone subprojects

This repository illustrates different concepts, patterns and examples via standalone subprojects. Each subproject is completely independent of the others and do not depend on the root project. This standalone subproject constraint forces the subprojects to be complete and maximizes the reader's chances of successfully running, understanding, and re-using the code.

The subprojects include:

`hello-world/`

Get up and running with Spark in an interactive way by using the Spark SQL CLI and Spark shell.

See the README in hello-world/.

`commandline/`

Use Spark in a way optimized for ad hoc commandline data-wrangling: less logging verbosity and a smaller file footprint.

See the README in commandline/.

Wish List

General clean-ups, TODOs and things I wish to implement for this project:

DONE hello world-style example
- Let's start with the basics: Spark shell?
- I already forgot why I had decided to use sbt instead of Gradle.
DONE Pare down interactive/ to just spark-sql and spark-shell and take the external table concept and bring that it it's own project. interactive/ will become a hello-world/ and the new project will be Something like light-weight,
- DONE Logging config.
Iceberg example (docker? or the Iceberg Java test impl?)
Distributed example? Docker?
Make some high level notes and stuff about de-coupling from Hadoop, etc.
Hive example. This is an important component in the general Spark culture. The official Hive Docker example should be useful here. I was able to build Hive from source but sadly it takes Java 8 and that's a sign that we need to move on from it a bit, and cordon it off into a Docker container.
[commandline/] Explore https://openjdk.org/jeps/483 for improved startup time in the commandline/ project.
[commandline/] Consider ejecting from the builtin spark-sql and spark-shell runners and make my own. The printing of startup messages like "Spark Web UI ..." makes it impossible to capture the output of the command. I'm curious how much core can be re-used and how much wrapper machinery gets in the way.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
commandline		commandline
hello-world		hello-world
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spark-playground

Standalone subprojects

`hello-world/`

`commandline/`

Wish List

About

Uh oh!

Releases

Packages

Languages

dgroomes/spark-playground

Folders and files

Latest commit

History

Repository files navigation

spark-playground

Standalone subprojects

hello-world/

commandline/

Wish List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`hello-world/`

`commandline/`

Packages