Interactive Spark Shell

Running commands interactively in the Apache Spark shell

Interactive Spark shell

You can run Spark commands interactively in the Spark shell. The Spark shell is available in Scala, Python, and R.

  1. Launch a long-running interactive bash session using dcos task exec.

  2. From your interactive bash session, pull and run a Spark Docker image.

     docker pull mesosphere/spark:2.8.0-2.4.0-hadoop-2.9
    
     docker run -it --net=host mesosphere/spark:2.8.0-2.4.0-hadoop-2.9 /bin/bash
    
  3. Run the Spark shell from within the Docker image.

    For the Scala Spark shell:

     ./bin/spark-shell --master mesos://<internal-leader-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:2.8.0-2.4.0-hadoop-2.9 --conf spark.mesos.executor.home=/opt/spark/dist
    

    For the Python Spark shell:

     ./bin/pyspark --master mesos://<internal-leader-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:2.8.0-2.4.0-hadoop-2.9 --conf spark.mesos.executor.home=/opt/spark/dist
    

    For the R Spark shell:

     ./bin/sparkR --master mesos://<internal-leader-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:2.8.0-2.4.0-hadoop-2.9 --conf spark.mesos.executor.home=/opt/spark/dist
    

    NOTE: Find your internal leader IP by going to dcos-url/mesos. The internal leader IP is listed in the upper left hand corner.

  4. Run Spark commands interactively.

    In the Scala shell:

     val textFile = sc.textFile("/opt/spark/dist/README.md")
     textFile.count()
    

    In the Python shell:

     textFile = sc.textFile("/opt/spark/dist/README.md")
     textFile.count()
    

    In the R shell:

     df <- as.DataFrame(faithful)
     head(df)