Minicluster¶
-
class
snakebite.minicluster.
MiniCluster
(testfiles_path, start_cluster=True, nnport=None)¶ Class that spawns a hadoop mini cluster and wrap hadoop functionality
This class requires the
HADOOP_HOME
environment variable to be set to run thehadoop
command. It will searchHADOOP_HOME
forhadoop-mapreduce-client-jobclient<version>-tests.jar
, but the location of this jar can also be supplied by theHADOOP_JOBCLIENT_JAR
environment variable.Since the current minicluster interface doesn’t provide for specifying the namenode post number, and chooses a random one, this class parses the output from the minicluster to find the port numer.
All supplied methods (like
put()
,ls()
, etc) use the hadoop command to perform operations, and not the snakebite client, since this is used for testing snakebite itself.All methods return a list of maps that are snakebite compatible.
Example without
snakebite.client
>>> from snakebite.minicluster import MiniCluster >>> cluster = MiniCluster("/path/to/test/files") >>> ls_output = cluster.ls(["/"])
Example with
snakebite.client
>>> from snakebite.minicluster import MiniCluster >>> from snakebite.client import Client >>> cluster = MiniCluster("/path/to/test/files") >>> client = Client('localhost', cluster.port) >>> ls_output = client.ls(["/"])
Just as the snakebite client, the cluster methods take a list of strings as paths. Wherever a method takes
extra_args
, normal hadoop command arguments can be given (like -r, -f, etc).More info can be found at http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CLIMiniCluster.html
Note
A minicluster will be started at instantiation
Note
Not all hadoop commands have been implemented, only the ones that were necessary for testing the snakebite client, but please feel free to add them
Parameters: - testfiles_path (string) – Local path where test files can be found. Mainly used for
put()
- start_cluster (boolean) – start a MiniCluster on initialization. If False, this class will act as an interface to the
hadoop fs
command
-
count
(src)¶ Perform
count
on a path
-
df
(src)¶ Perform
df
on a path
-
du
(src, extra_args=[])¶ Perform
du
on a path
-
exists
(path)¶ Return True if <src> exists, False if doesn’t
-
is_directory
(path)¶ Return True if <path> is a directory, False if it’s NOT a directory
-
is_files
(path)¶ Return True if <path> is a file, False if it’s NOT a file
-
is_greater_then_zero_bytes
(path)¶ Return True if file <path> is greater than zero bytes in size, False otherwise
-
is_zero_bytes_file
(path)¶ Return True if file <path> is zero bytes in size, else return False
-
ls
(src, extra_args=[])¶ List files in a directory
-
mkdir
(src, extra_args=[])¶ Create a directory
-
put
(src, dst)¶ Upload a file to HDFS
This will take a file from the
testfiles_path
supplied in the constuctor.
-
terminate
()¶ Terminate the cluster
Since the minicluster is started as a subprocess, this method has to be called explicitely when your program ends.
- testfiles_path (string) – Local path where test files can be found. Mainly used for