The Mesos cluster dispatcher is responsible for queuing, tracking, and supervising drivers. Potential problems can arise if the dispatcher does not receive the resources offers you expect from Mesos, or if driver submission is failing. To debug this class of issue, visit the Mesos UI at
http://<dcos-url>/mesos/and navigate to the sandbox for the dispatcher.
Spark has an internal mechanism for detecting the IP address of the host. DC/OS uses this method by default, but sometimes it fails, returning errors like these:
ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main] java.net.UnknownHostException: ip-172-31-4-148: ip-172-31-4-148: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1505) at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:891) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:884) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:884) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:941) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.localHostName(Utils.scala:941) at org.apache.spark.deploy.mesos.MesosClusterDispatcherArguments.<init>(MesosClusterDispatcherArguments.scala:27) at org.apache.spark.deploy.mesos.MesosClusterDispatcher$.main(MesosClusterDispatcher.scala:103) at org.apache.spark.deploy.mesos.MesosClusterDispatcher.main(MesosClusterDispatcher.scala) Caused by: java.net.UnknownHostException: ip-172-31-4-148: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ... 10 more 18/01/25 17:42:57 INFO ShutdownHookManager: Shutdown hook called
In the case of IP address failures, enable the
service.use_bootstrap_for_IP_detectoption in the dispatcher configuration, by editing the task in the UI or by setting the parameter to
options.jsonfile, then restart the service. Changing this setting and restarting the service causes the DC/OS-specific
bootstraputility to detect the IP address, which may allow the initialization of the Spark service to complete.
DC/OS Apache Spark jobs are submitted through the dispatcher, which displays Spark properties and job state. Start here to verify that the job is configured as you expect.
The dispatcher further provides a link to the job’s entry in the history server, which displays the Spark Job UI. The UI shows scheduling and performance information for the job. Go here to debug issues with scheduling and performance.
Jobs themselves log output to their sandbox, which you can access through the Mesos UI. The Spark logs are sent to standard error (
stderr), while any output you write in your job is sent to standard output (
To disable using the Mesosphere
bootstraputility for host IP detection in jobs, add
spark.mesos.driverEnv.SKIP_BOOTSTRAP_IP_DETECT=trueto your job configuration.
The Spark CLI is integrated with the dispatcher so that they always use the same version of Spark, and so that certain defaults are honored. To debug issues with their communication, run your jobs with the
To debug authentication in a Spark job, enable Java security debug output:
dcos spark run --submit-args="--conf sun.security.krb5.debug=true..."