HDFS 2.3.0-2.6.0-cdh5.11.0

Welcome to the documentation for DC/OS Apache HDFS. DC/OS Apache HDFS is a managed service that makes it easy to deploy and manage an HA (High Availability) Apache HDFS cluster on Mesosphere DC/OS. Apache HDFS (Hadoop Distributed File System) is an open source distributed file system based on Google’s GFS (Google File System) paper. It is a replicated and distributed file system interface for use with “big data” and “fast data” applications.

Benefits

DC/OS HDFS offers the following benefits:

  • Easy installation
  • Multiple HDFS clusters
  • Elastic scaling of data nodes
  • Integrated monitoring

Features

DC/OS HDFS provides the following features:

  • Single-command installation for rapid provisioning
  • Persistent storage volumes for enhanced data durability
  • Runtime configuration and software updates for high availability
  • Health checks and metrics for monitoring
  • Distributed storage scale out
  • HA name service with Quorum Journaling and ZooKeeper failure detection

Getting Started

To start a basic test cluster with three journal nodes, two name nodes, and three data nodes, run the following command on the DC/OS CLI.…Read More

Configuration

The default DC/OS Apache HDFS installation provides reasonable defaults for trying out the service, but may not be sufficient for production use. You may require a different configuration depending on the context of the deployment.…Read More

Operations

The DC/OS Apache HDFS service provides a robust API (accessible by HTTP or DC/OS CLI) for managing, repairing, and monitoring the service. Here, only the CLI version is presented for conciseness, but see the API Reference for HTTP instructions.…Read More

Updates

Enterprise DC/OS 1.10 introduced a convenient command line option that allows for easier updates to a service’s configuration and version, as well as allowing users to inspect the status of an update, to pause and resume updates, and to restart or complete steps if necessary.…Read More

Security

The DC/OS Apache HDFS service supports HDFS’s native transport encryption, authentication, and authorization mechanisms. The service provides automation and orchestration to simplify the usage of these important features.…Read More

Uninstall

If you are using DC/OS 1.10 or later and the installed service has a version later than 2.0.0-x, then uninstalling the service is simple.…Read More

Troubleshooting

After a configuration change, the service may enter an unhealthy state. This commonly occurs when an invalid configuration change was made by the user. Certain configuration values may not be changed, or may not be decreased. To verify whether this is the case, check the service’s deploy plan for any errors.…Read More

Advanced

This section describes some advanced features of the DC/OS Apache HDFS service.…Read More

API Reference

The DC/OS Apache HDFS Service implements a REST API that may be accessed from outside the cluster. The parameter referenced below indicates the base URL of the DC/OS cluster on which the DC/OS Apache HDFS Service is deployed.…Read More

Limitations

Out-of-band configuration modifications are not supported. The service’s core responsibility is to deploy and maintain the service with a specified configuration. In order to do this, the service assumes that it has ownership of task configuration. If an end-user makes modifications to individual tasks through out-of-band configuration operations, the service will override those modifications at a later time. For example:…Read More

Support Policy

DC/OS and certified package version support policy are described in detail here.…Read More

Release Notes

Discover the new features, updates, and known limitations in this release of the HDFS Service…Read More