Thursday, August 10, 2017

Hadoop - How to setup a Hadoop Cluster

Below is a step-by-step guide which I had used to setup a Hadoop Cluster

Scenario


3 VMs involved:

1) NameNode, ResourceManager - Host name: NameNode.net
2) DataNode 1 - Host name: DataNode1.net
3) DataNode 2 - Host name: DataNode2.net


Pre-requisite 


1) You could create a new Hadoop user or use an existing user. But make sure the user have access to the Hadoop installation in ALL nodes

2) Install JAVA. Refer here for a good version. In this guide, Java is installed at /usr/java/latest

3) Download a stable version of Hadoop from Apache Mirrors

This guide is based on Hadoop 2.7.1 and assume that we had create a user call hadoop


Setup Passphaseless SSH from NameNode to all Nodes.


1) Run the command

ssh-keygen

This command will ask you a set of questions and accepting the default is fine. Eventually, it will create a set of private key (id_rsa) and public key (id_rsa.pub) at the user directory (/home/hadoop/.ssh)

2) Copy the public key to all Nodes with the following

ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub NameNode.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode1.net
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub DataNode2.net

3) Test the passphaseless SSH connection from NameNode with

ssh (hostname)


Install Hadoop in all Node


1) With the downloaded Hadoop distribution. Unzip it to a location where the Hadoop user had access

For this guide, I had create a /usr/local/hadoop and un-tar the distribution at this folder. The full path of Hadoop installation is /usr/local/hadoop/hadoop-2.7.1


Setup Environment Variables


1) It is best that Hadoop Variables are exported to the environment when user log in. To do so, run the command at the NameNode

sudo vi /etc/profile.d/hadoop.sh

2) Add the following in /etc/profile.d/hadoop.sh

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/usr/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
3) Source this file or re-login to setup the environment.

4) (OPTIONAL) Set up the above for all Nodes.


Setup NameNode & ResourceManager


1) Make a directory to hold NameNode data

mkdir /usr/local/hadoop/hdfs_namenode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml




Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml





4) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/mapred-site.xml (We are using NameNode as ResourceManager)



5) (OPTIONAL) Setup $HADOOP_HOME/etc/hadoop/yarn-site.xml (We are using NameNode as ResourceManager)


6) Setup $HADOOP_HOME/etc/hadoop/slaves

First, remove localhost from the file, then add the following



Setup DataNodes


1) Make a directory to hold DataNode data

mkdir /usr/local/hadoop/hdfs_datanode

2) Setup $HADOOP_HOME/etc/hadoop/hdfs-site.xml



Note: dfs.datanode.data.dir value must be a URI

3) Setup $HADOOP_HOME/etc/hadoop/core-site.xml




Format NameNode


The above setting should be enough to set up the Hadoop cluster. Next, for the first time, you will need to format the NameNode. Use the following command to format the NameNode

hdfs namenode -format

Example output is



Note: the same command can be used to reformat your existing NameNode. But remember to clean up your datanodes hdfs folder as well.


Start NameNode


You can start Hadoop with the given script

start-dfs.sh

Example output is










Stop NameNode


You can stop Hadoop with the given script

stop-dfs.sh

Example output is



Start ResourceManager


You can start the ResourceManager, in this case, Yarn, with the given script

start-yarn.sh

Example output is




Stop ResourceManager


You can stop the ResourceManager, in this case, Yarn, with the given script

stop-yarn.sh

Example output is



Show status of Hadoop


You can use the following command to show status of Hadoop

jps

Example output is











Complete Testing


You can also do the following to perform a complete test to ensure Hadoop is running fine.






















You could access the Hadoop Resource Manager information at http://NameNode_hostname:8088



You could also access the Hadoop cluster summary at http://NameNode_hostname:50070. You should be able to see the number of datanodes being setup for the cluster.


Reference


1. http://www.server-world.info/en/note?os=CentOS_7&p=hadoop
2. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html


Tuesday, July 11, 2017

Android - Peel Remote App "Good Night" Screen


Even since upgrading to Android 7.0 "accidentally", I saw the following "Good Night" screen at night. 



This is irritating, and apparently, this "Good Night" screen come from Peel Remote App. If you disable the Peel Remote App, the "Good Night" screen will be gone. I don't use this app anyway...



Friday, May 26, 2017

Chrome - Chrome appear out of screen / offscreen

I hate Chrome when moved out of screen and you cannot move it back. To fix that,

1. Open the Chrome window (it will be off screen somewhere)
2. Press ALT + Spacebar + X

Then above will maximize your Chrome back to your primary monitor.

Monday, April 17, 2017

Hive - URISyntaxException: Relative path in absolute URI

Problem


I encountered the following exception when I tried to start up Hive

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D


Solution


This is configuration in hive-site.xml. Open up hive-site.xml and look for ${system:java.io.tmpdir}/${system:user.name}. If found, replace them with a proper value, e.g. /tmp/somedir. After that, run Hive again.

Thursday, March 23, 2017

Hive - Hive metastore database is not initialized

Problem


I encountered the following error when I tried to start up Hive

Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)

Solution


a. Delete the existing $DERBY_HOME/data/metastore_db with rm command

b. Stop the Derby server

c. Run schematool -initSchema -dbType derby

Tuesday, February 7, 2017

Hive - Permission denied when trying to start Hive with other users

Problem


I had installed Hive and the Hiver CLI work with the user who installed Hive. However, when I try to start Hive CLI with another user, it failed with

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
    ... 8 more
Caused by: java.io.IOException: Permission denied
    at java.io.UnixFileSystem.createFileExclusively(Native Method)
    at java.io.File.createTempFile(File.java:2001)
    at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)

Solution


Usually, this is due to permission issue at the HIVE scratch directory (hive.exec.local.scratchdir) defined at hive-site.xml. You could use chmod to change the directory permission directly or configure hive.scratch.dir.permission at hive-site.xml to other value (default at 700)

Sunday, January 15, 2017

Hadoop - Installing Hive with Derby

Summary


This is my personal experience and summary steps to install Hive and Derby

Pre-requisite


  • Download HIVE from Apache (I am using Hive 2.0.0)
  • Download Derby from Apache (I am using Derby 10.12.1.1)
  • Make sure Java 1.7 is installed
  • Hadoop is configured and working (I am using Hadoop 2.7.1)

Installing Hive


1. Move the Hive installer to a directory. For this example, I had create a folder /usr/local/hive for Hive

cp apache-hive-2.0.0-bin.tar.gz /usr/local/hive

2. Unpackage Hive

tar -xzvf apache-hive-2.0.0-bin.tar.gz

3. Set Hive environment variable

You will need to set the following in the environment

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

So, you can do the following to export Hive variable to the environment when user log in

a. Create a  /etc/profile.d/hive.sh

sudo vi /etc/profile.d/hive.sh

b. Add the following in /etc/profile.d/hive.sh

export HIVE_HOME=/usr/local/hive/apache-hive-2.0.0-bin
export PATH=$HIVE_HOME/bin:$PATH

c. Source this file or re-login to setup the environment.

4. Next step, we will need to install Apache Derby

Install Hive Metastore - Apache Derby


In this example, I will use Apache Derby as Hive metastore

1. Move the Derby installer to a directory. For this example, I had create a folder /usr/local/derby for Derby

cp db-derby-10.12.1.1-bin.tar.gz /usr/local/derby

2. Unpackage Derby

tar -zxvf db-derby-10.12.1.1-bin.tar.gz

3. Set Derby environment variable

You will need to set the following
export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH

So, you you can do the following to export Derby variable to the environment when user log in

a. Create a  /etc/profile.d/derby.sh

sudo vi /etc/profile.d/derby.sh

b. Add the following in /etc/profile.d/derby.sh

export DERBY_HOME=/usr/local/derby/db-derby-10.12.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH
export DERBY_OPTS="-Dderby.system.home=$DERBY_HOME/data"


c. Source this file or re-login to setup the environment.

4. Create a Metastore directory

Create a data directory to hold the Metastore

mkdir $DERBY_HOME/data

5. Derby configuration is completed. Next section will tell you how to start and stop Derby


Start and Stop Derby


By default Derby will create databases in the directory it was started from, that mean, if you start Derby at /tmp, it will use /tmp as Derby system home and create a Metastore at /tmp. For this example, we had already set DERBY_OPTS with -Dderby.system.home=$DERBY_HOME/data. This mean, we can start Derby server at any directory and it will still use $DERBY_HOME/data as the system home.

Now you can start up Derby with
nohup startNetworkServer -h 0.0.0.0 &

To stop Derby, do

stopNetworkServer

Once you are able to startup Derby, we need to configure Hive to talk to Derby.

Configure Hive with Derby

1. Go to Hive configuration folder and create a hive-site.xml

$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml

2. Add the following in hive-site.xml. During my installation, these variable already exist in the hive-default.xml.template. So, search for them.










3. Create /opt/hadoop/hive/conf/jpox.properties

vi  $HIVE_HOME/conf/jpox.properties

4. Add the folloing to jpox.properties

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
5. Copy the following file to Hive library folder

cp $DERBY_HOME/lib/derbyclient.jar $HIVE_HOME/lib
cp $DERBY_HOME/lib/derbytools.jar $HIVE_HOME/lib

6. Hive configuration is completed. Now, we need to set up the HDFS to for HIVE to use

Configure Hadoop HDFS for HIVE


Hive need the following HDFS folder in order to run. To create them, do the following

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse


Test and Run Hive

All configuration should be completed. You can test Hive with the following




Reference

1. https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
2. https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
3. https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive


Hadoop - How to setup a Hadoop Cluster

Below is a step-by-step guide which I had used to setup a Hadoop Cluster Scenario 3 VMs involved: 1) NameNode, ResourceManager - Host...