본문 바로가기

Hadoop

[Hadoop] 의사(Pseudo)모드 실행

2013.1.16 1.0.4 버전을 기준으로 수정

의사모드 실행

hadoop-env.sh
hdfs-site.xml
core-site.xml
mapred-site.xml
masters
slaves

1. 각 설정 파일에 설정 
hadoop-env.sh
JAVA_HOME 수정

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home

masters
localhost

slaves
localhost

core-site.xml
설명 :  http://hadoop.apache.org/common/docs/current/core-default.html

<configuration>
        <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
        </property>
</configuration>




hdfs-site.xml
설명 : http://hadoop.apache.org/common/docs/current/hdfs-default.html

 <property>
                <name>dfs.name.dir</name>
                <value>/home/need4spd/hadoop/dfs/name</value>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>/home/need4spd/hadoop/dfs/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>


dfs.data.dir, dfs.name.dir은 local의 경로... 그러니까 hdfs 데몬이 hdfs에 파일을 저장 할 때
사용 될 data 디렉토리와 namenode로 사용 될 디렉토리입니다. 
지정안하면 /tmp 쪽으로 생성되더군요...

mapred-site.xml
설명 : http://hadoop.apache.org/common/docs/current/mapred-default.html

 <property>
                <name>mapred.job.tracker</name>
                <value>localhost:9001</value>
        </property>
        <property>
                <name>mapred.system.dir</name>
                <value>/home/need4spd/hadoop/dfs/mapreduce/system</value>
        </property>
        <property>
                <name>mapred.local.dir</name>
                <value>/home/need4spd/hadoop/dfs/mapreduce/local</value>
        </property>



2. hdfs 포맷 (namenode 포맷)
 
HADOOP_HOME/bin에서

hadoop namenode -format


11/07/15 16:28:46 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = 
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
************************************************************/
11/07/15 16:28:46 INFO util.GSet: VM type       = 64-bit
11/07/15 16:28:46 INFO util.GSet: 2% max memory = 17.77875 MB
11/07/15 16:28:46 INFO util.GSet: capacity      = 2^21 = 2097152 entries
11/07/15 16:28:46 INFO util.GSet: recommended=2097152, actual=2097152
11/07/15 16:28:46 INFO namenode.FSNamesystem: fsOwner=need4spd
11/07/15 16:28:46 INFO namenode.FSNamesystem: supergroup=supergroup
11/07/15 16:28:46 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/07/15 16:28:46 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
11/07/15 16:28:46 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
11/07/15 16:28:46 INFO namenode.NameNode: Caching file names occuring more than 10 times
11/07/15 16:28:46 INFO common.Storage: Image file of size 111 saved in 0 seconds.
11/07/15 16:28:47 INFO common.Storage: Storage directory /home/need4spd/hadoop/dfs/name has been successfully formatted.
11/07/15 16:28:47 INFO namenode.NameNode: SHUTDOWN_MSG:


그러면 hdfs-site.xml에 설정한 name노드 경로가 포맷이 된다. (hdfs포맷으로 되는것..)

3. 데몬구동
HADOOP_HOME 에서..
./bin/start-all.sh

[need4spd@need4spdui-MacBook-Pro hadoop ]$ ./bin/start-all.sh 
starting namenode, logging to /Users/need4spd/Programming/Java/hadoop-1.0.4/libexec/../logs/hadoop-need4spd-namenode-need4spdui-MacBook-Pro.local.out
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
2013-01-16 23:17:25.282 java[2059:1607] Unable to load realm info from SCDynamicStore
Password:
localhost: starting datanode, logging to /Users/need4spd/Programming/Java/hadoop-1.0.4/libexec/../logs/hadoop-need4spd-datanode-need4spdui-MacBook-Pro.local.out
localhost: Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
localhost: Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
localhost: 2013-01-16 23:17:28.806 java[2145:1507] Unable to load realm info from SCDynamicStore
Password:
localhost: starting secondarynamenode, logging to /Users/need4spd/Programming/Java/hadoop-1.0.4/libexec/../logs/hadoop-need4spd-secondarynamenode-need4spdui-MacBook-Pro.local.out
localhost: Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
localhost: Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
localhost: 2013-01-16 23:17:31.924 java[2233:1507] Unable to load realm info from SCDynamicStore
starting jobtracker, logging to /Users/need4spd/Programming/Java/hadoop-1.0.4/libexec/../logs/hadoop-need4spd-jobtracker-need4spdui-MacBook-Pro.local.out
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
Password:
localhost: starting tasktracker, logging to /Users/need4spd/Programming/Java/hadoop-1.0.4/libexec/../logs/hadoop-need4spd-tasktracker-need4spdui-MacBook-Pro.local.out
localhost: Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
localhost: Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
localhost: 2013-01-16 23:17:36.699 java[2377:1507] Unable to load realm info from SCDynamicStore

확인
jps

[need4spd@need4spdui-MacBook-Pro hadoop ]$ jps
153 
2233 SecondaryNameNode
2145 DataNode
2294 JobTracker
2949 Jps
2059 NameNode
2377 TaskTracker

웹UI 관리자 페이지 (네임노드의  IP)
http://localhost:50070/dfshealth.jsp

4. 예제실행

hadoop-env.sh를 HDFS에 업로드
./bin/hadoop fs -put conf/hadoop-env.sh conf/hadoop-env.sh

wordcount 예제 실행
./bin/hadoop jar hadoop-examples-1.0.4.jar wordcount conf/hadoop-env.sh output

결과조회

./bin/hadoop fs -cat output/part-r-00000


5. hdfs 데몬 띄우기 (3에서 start-all.sh로 띄웠으면 skip)

HADOOP_HOME/bin에서

start-dfs.sh


starting namenode, logging to /home/need4spd/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-need4spd-namenode-hostname.out
localhost: starting datanode, logging to /home/need4spd/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-need4spd-datanode-hostname.out
localhost: starting secondarynamenode, logging to /home/need4spd/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-need4spd-secondarynamenode-hostname.out



start-mapred.sh


starting jobtracker, logging to /home/need4spd/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-need4spd-jobtracker-hostname.out
localhost: starting tasktracker, logging to /home/need4spd/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-need4spd-tasktracker-hostname.out


4. file copy local to hdfs
(상대경로 이용 - HDFS 홈 디렉토리로 복사)
hadoop fs -copyFromLocal /home/need4spd/hadoop/input.txt input.txt 
hadoop fs -ls .

Found 1 items
-rw-r--r--   1 need4spdsupergroup          2 2011-07-15 16:32 /user/need4spd/input.txt


5. 삭제
hadoop fs -rm input.txt
Deleted hdfs://localhost/user/need4spd/input.txt

6. 다시 복사 (절대경로로)
hadoop fs -copyFromLocal /home/need4spd/hadoop/input.txt /user/need4spd/input.txt
hadoop fs -ls .

Found 1 items
-rw-r--r--   1 need4spdsupergroup          2 2011-07-15 16:32 /user/need4spd/input.txt


URI 스킴 이용해도 동일함
hadoop fs -copyFromLocal /home/need4spd/hadoop/input.txt hdfs://localhost/user/need4spd/input.txt

7. HDFS에서 꺼내보자
hadoop fs -copyToLocal input.txt /home/need4spd/out/input.txt

앞의 경로는 hdfs경로.. 뒤 경로는 local 경로..

마찬가지로 앞서 만든 map reduce 프로그램도 실행 할 수 있는데
분석 대상 파일이 먼저 hdfs로 복사 되어 있어야 하고, 이에 따라 분석대상 파일의 경로도
hdfs를 기준으로 생각하여 지정해줘야 한다. 

아... 앞 설정 파일 중 masters 와 slaves에는
localhost (기본 상태)로 되어있으면 됩니다.