Minicluster

class snakebite.minicluster.MiniCluster(testfiles_path, start_cluster=True, nnport=None)

Class that spawns a hadoop mini cluster and wrap hadoop functionality

This class requires the HADOOP_HOME environment variable to be set to run the hadoop command. It will search HADOOP_HOME for hadoop-mapreduce-client-jobclient<version>-tests.jar, but the location of this jar can also be supplied by the HADOOP_JOBCLIENT_JAR environment variable.

Since the current minicluster interface doesn’t provide for specifying the namenode post number, and chooses a random one, this class parses the output from the minicluster to find the port numer.

All supplied methods (like put(), ls(), etc) use the hadoop command to perform operations, and not the snakebite client, since this is used for testing snakebite itself.

All methods return a list of maps that are snakebite compatible.

Example without snakebite.client

>>> from snakebite.minicluster import MiniCluster
>>> cluster = MiniCluster("/path/to/test/files")
>>> ls_output = cluster.ls(["/"])

Example with snakebite.client

>>> from snakebite.minicluster import MiniCluster
>>> from snakebite.client import Client
>>> cluster = MiniCluster("/path/to/test/files")
>>> client = Client('localhost', cluster.port)
>>> ls_output = client.ls(["/"])

Just as the snakebite client, the cluster methods take a list of strings as paths. Wherever a method takes extra_args, normal hadoop command arguments can be given (like -r, -f, etc).

More info can be found at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CLIMiniCluster.html

Note

A minicluster will be started at instantiation

Note

Not all hadoop commands have been implemented, only the ones that were necessary for testing the snakebite client, but please feel free to add them

Parameters:
  • testfiles_path (string) – Local path where test files can be found. Mainly used for put()
  • start_cluster (boolean) – start a MiniCluster on initialization. If False, this class will act as an interface to the hadoop fs command
count(src)

Perform count on a path

df(src)

Perform df on a path

du(src, extra_args=[])

Perform du on a path

exists(path)

Return True if <src> exists, False if doesn’t

is_directory(path)

Return True if <path> is a directory, False if it’s NOT a directory

is_files(path)

Return True if <path> is a file, False if it’s NOT a file

is_greater_then_zero_bytes(path)

Return True if file <path> is greater than zero bytes in size, False otherwise

is_zero_bytes_file(path)

Return True if file <path> is zero bytes in size, else return False

ls(src, extra_args=[])

List files in a directory

mkdir(src, extra_args=[])

Create a directory

put(src, dst)

Upload a file to HDFS

This will take a file from the testfiles_path supplied in the constuctor.

terminate()

Terminate the cluster

Since the minicluster is started as a subprocess, this method has to be called explicitely when your program ends.