So I talked about getting a cluster ready for Cloudera CDH4 free install yesterday, now I am going to talk about actually installing hadoop on the cluster. You can see the setup article here
. Again, this is basically my install notes, if I do it again, I shall take screen shots to share.
1. Set up Yum to download (optional)
--add the following line
export http_proxy --created as variable within linux
2. Download Cloudera CDH4
3. Install CDH4
chmod u+x cloudera-manager-installer.bin
Wait for the install
4. Log into the web portal at the link provided by the installer
5. Choose the Cloudera Standard Install (Free)
Click Continue on the installer detail screen.
Specify hosts for your CDH cluster installation, list each node on a new line with fully qualified name.
click Search, if any errors, go fix your host files.
click Continue if no errors.
6. Choose packages
Use packages if just downloading initially
Use parcels if you want to make your own store to have a named/saved version for adding single nodes later without upgrading. With this option you need to pick a version or download a copy and host yourself to upgrade from.
Choose the Versions you want or don't.
7. Cluster Installation
choose root or user and enter your password.
8. Watch Install
Watch the spinning circles and pray nothing goes poorly.
9. Look at errors you have to fix
10. Inspect Role assignments
Set your name node
set your secondary namenode
Set at least 3 zookeepers
Set all nodes that aren't name to tasktracker and data node.
Gateway is the 2 name nodes
Job tracker is on secondar name node.
Hive meta store on namenode
Push HiveServer, Hue, Cloudera Manager, Service Monitor and all alerts on the name node.
All other services you probably wont use on secondary name node.
click Continue when you think it is configured correctly.
11. Database Setup
Choose your db types, default PostgreSQL.
Save usernames passwords for later.
12. Review Server Configurations
Check Data Directories, this is the drives hdfs will use on the machines
Checking volumes on data nodes, namenodes, secondary name nodes
Ideally you don't have to change much here, just look for errors.
13. Starting cluster services
It proceeds through setting up of the cluster, it takes a while and is the final step.
14. Now you should be at the Cloudera Manager Dashboard