So I am new to Hadoop and trying to figure out how to best interact with it programatically. However, its hard to do this without first understanding how to do simple operations on it.
Hadoop is basically a cluster of non-expensive servers that work together to achieve relatively cheap data storage and processing power. It is basically a distributed file system with access applications on top of it. Today I am looking at the file system half called HDFS and how to work inside of it.
Now HDFS is a file system that runs accross several servers so accessing it requires permissions from an account to the file system on the linux box in its simplest form. For this example, I created a user called hdfsuer and here are a few basic operations with that user.
1. Create directory named data
sudo -u hdfs hadoop fs -mkdir /data
This command is rather straight forward. You use sudo to run as root, then the user (-u) hdfs (general overall hdfs user account) hadoop (basic hadoop command) fs (file system) -mkdir (familiar linux directory creation command) /data (folder to make).
2. Allow everyone permission to the folder
sudo -u hdfs hadoop fs -chmod 777/data
This command is the same as the last except it uses the linux command chmod to modify the permissions on the folder data to 777 or everyone.
3. Creating new folder for new account
sudo -u hdfs hadoop fs -mkdir /user/hdfsuser
The first command switches to the new uers, and the second command is the same as number 1 creating a directory, except the directory is now /user/hdfsuser. For each user you create you should create a directory in the /user/ folder.
4. Giving user access to that folder
sudo -u hdfs hadoop fs -chown hdfsuser /user/hdfsuser
Again, this command is simular to one of the others, this time number 2. It gives the user hdfsuser ownership of the folder /user/hdfsuser .
5. Create place for user to put different sorts of files
hadoop fs /mkdir /user/hdfsuser/data
hadoop fs -copyFromLocal datafile /user/hdfsuser/data
hadoop fs -cat /user/bigdata/data/*
So this set of commands will 1, use the user hdfsuser, creates a new directory under the user/hdfsuser directory for a certain type of data then load a file datafile to that directory and finally select that back out to see.
So there you have it, the basic folder management and sub user commands to create directories and load local files. I will jump more into hadoop as I progress, but that's enough for today.