Sakinisha Kundi la Hadoop Multinode ukitumia CDH4 katika RHEL/CentOS 6.5
Hadoop ni mfumo wa programu huria uliotengenezwa na apache ili kuchakata data kubwa. Inatumia HDFS (Mfumo wa Faili Zilizosambazwa za Hadoop) kuhifadhi data kwenye nodi zote za data kwenye nguzo kwa njia ya kusambaza na kuunda muundo wa kuchakata data.
Namenode (NN) ni daemoni kuu ambayo inadhibiti HDFS na Jobtracker (JT) ni daemon kuu kwa injini ya mapreduce.
Katika somo hili ninatumia 'master' mbili za CentOS 6.3 VM na 'node' yaani. (bwana na nodi ni majina yangu ya mwenyeji). IP ya ‘master’ ni 172.21.17.175 na nodi IP ni ‘172.21.17.188’. Maagizo yafuatayo pia yanafanya kazi kwenye matoleo ya RHEL/CentOS 6.x.
hostname master
ifconfig|grep 'inet addr'|head -1 inet addr:172.21.17.175 Bcast:172.21.19.255 Mask:255.255.252.0
hostname node
ifconfig|grep 'inet addr'|head -1 inet addr:172.21.17.188 Bcast:172.21.19.255 Mask:255.255.252.0
Kwanza hakikisha kuwa wapangishi wote wa nguzo wapo kwenye faili ya ‘/etc/hosts’ (kwenye kila nodi), ikiwa huna DNS iliyosanidiwa.
cat /etc/hosts 172.21.17.175 master 172.21.17.188 node
cat /etc/hosts 172.21.17.197 qabox 172.21.17.176 ansible-ground
Kufunga Nguzo ya Multinode ya Hadoop katika CentOS
Tunatumia hazina rasmi ya CDH kusakinisha CDH4 kwenye seva pangishi zote (Master na Node) katika kundi.
Nenda kwa ukurasa rasmi wa upakuaji wa CDH na unyakue toleo la CDH4 (yaani 4.6) au unaweza kutumia amri ifuatayo ya wget kupakua hazina na kuisakinisha.
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/i386/cloudera-cdh-4-0.i386.rpm # yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm # yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
Kabla ya kusakinisha Kundi la Hadoop Multinode, ongeza Kitufe cha Cloudera Public GPG kwenye hazina yako kwa kutekeleza mojawapo ya amri zifuatazo kulingana na usanifu wa mfumo wako.
## on 32-bit System ## # rpm --import http://archive.cloudera.com/cdh4/redhat/6/i386/cdh/RPM-GPG-KEY-cloudera
## on 64-bit System ## # rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
Ifuatayo, endesha amri ifuatayo ya kusakinisha na kusanidi JobTracker na NameNode kwenye seva kuu.
yum clean all yum install hadoop-0.20-mapreduce-jobtracker
yum clean all yum install hadoop-hdfs-namenode
Tena, endesha amri zifuatazo kwenye seva kuu ili kusanidi nodi ya jina la pili.
yum clean all yum install hadoop-hdfs-secondarynam
Ifuatayo, sanidi kifuatilia kazi & nodi ya data kwenye vipangishi vyote vya nguzo (Njia) isipokuwa Vipangishi vya JobTracker, NameNode, na Sekondari (au Standby) NameNode ( kwenye nodi katika kesi hii).
yum clean all yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
Unaweza kusanikisha mteja wa Hadoop kwenye mashine tofauti ( katika kesi hii nimeiweka kwenye datanode unaweza kuiweka kwenye mashine yoyote).
yum install hadoop-client
Sasa ikiwa tumemaliza hatua zilizo hapo juu wacha tusonge mbele kupeleka hdfs (kufanywa kwa nodi zote).
Nakili usanidi chaguo-msingi kwenye saraka ya /etc/hadoop ( kwenye kila nodi kwenye nguzo ).
cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
Tumia amri mbadala kuweka saraka yako maalum, kama ifuatavyo ( kwenye kila nodi kwenye cluster ).
alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 reading /var/lib/alternatives/hadoop-conf alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 reading /var/lib/alternatives/hadoop-conf alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
Sasa fungua faili ya ‘core-site.xml’ na usasishe “fs.defaultFS” kwenye kila nodi kwenye nguzo.
cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master/</value> </property> </configuration>
cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master/</value> </property> </configuration>
Sasisho linalofuata dfs.permissions.superusergroup katika hdfs-site.xml kwenye kila nodi kwenye nguzo.
cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> </configuration>
cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> </configuration>
Kumbuka: Tafadhali hakikisha kuwa, usanidi ulio hapo juu upo kwenye nodi zote (fanya kwenye nodi moja na uendeshe scp kunakili kwenye nodi zingine).
Sasisha dfs.name.dir au dfs.namenode.name.dir katika 'hdfs-site.xml' kwenye NameNode ( kwenye Master na Node ). Tafadhali badilisha thamani kama ilivyoangaziwa.
cat /etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value> </property>
cat /etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value> </property>
Tekeleza amri zilizo hapa chini ili kuunda muundo wa saraka na kudhibiti ruhusa za mtumiaji kwenye Namenode (Master) na mashine ya Datanodi (Node).
mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
Fomati Namenode (kwenye Mwalimu), kwa kutoa amri ifuatayo.
sudo -u hdfs hdfs namenode -format
Ongeza kipengele kifuatacho kwenye faili ya hdfs-site.xml na ubadilishe thamani kama inavyoonyeshwa kwenye Master.
<property> <name>dfs.namenode.http-address</name> <value>172.21.17.175:50070</value> <description> The address and port on which the NameNode UI will listen. </description> </property>
Kumbuka: Kwa upande wetu thamani inapaswa kuwa anwani ya ip ya master VM.
Sasa hebu tupeleke MRv1 ( Ramani-punguza toleo la 1). Fungua faili ya ‘mapred-site.xml’ kufuata maadili kama inavyoonyeshwa.
cp hdfs-site.xml mapred-site.xml vi mapred-site.xml cat mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>master:8021</value> </property> </configuration>
Ifuatayo, nakili faili ya 'mapred-site.xml' kwenye mashine ya nodi kwa kutumia amri ifuatayo ya scp.
scp /etc/hadoop/conf/mapred-site.xml node:/etc/hadoop/conf/ mapred-site.xml 100% 200 0.2KB/s 00:00
Sasa sanidi saraka za hifadhi za ndani ili zitumike na MRv1 Daemons. Fungua tena faili ya ‘mapred-site.xml’ na ufanye mabadiliko kama inavyoonyeshwa hapa chini kwa kila TaskTracker.
<property> Â <name>mapred.local.dir</name> Â <value>/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local</value> </property>
Baada ya kubainisha saraka hizi katika faili ya ‘mapred-site.xml‘, lazima uunde saraka na uzipe ruhusa sahihi za faili kwenye kila nodi kwenye nguzo yako.
mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
Sasa endesha amri ifuatayo ili kuanza HDFS kwenye kila nodi kwenye nguzo.
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
Inahitajika kuunda /tmp na ruhusa sahihi kama ilivyotajwa hapa chini.
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
Sasa thibitisha muundo wa Faili ya HDFS.
sudo -u hdfs hadoop fs -ls -R / drwxrwxrwt - hdfs hadoop 0 2014-05-29 09:58 /tmp drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var/lib drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache drwxr-xr-x - mapred hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred drwxr-xr-x - mapred hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred drwxrwxrwt - mapred hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
Baada ya kuanza HDFS na kuunda '/tmp', lakini kabla ya kuanza JobTracker tafadhali tengeneza saraka ya HDFS iliyobainishwa na kigezo cha 'mapred.system.dir' (kwa chaguo-msingi $ {hadoop.tmp.dir}/mapred/system na ubadilishe mmiliki kuwa ramani.
sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system
Kuanzisha MapReduce : tafadhali anzisha huduma za TT na JT.
service hadoop-0.20-mapreduce-tasktracker start Starting Tasktracker: [ OK ] starting tasktracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-node.out
service hadoop-0.20-mapreduce-jobtracker start Starting Jobtracker: [ OK ] starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-master.out
Ifuatayo, tengeneza saraka ya nyumbani kwa kila mtumiaji wa hadoop. inashauriwa kufanya hivi kwenye NameNode; kwa mfano.
sudo -u hdfs hadoop fs -mkdir /user/<user> sudo -u hdfs hadoop fs -chown <user> /user/<user>
Kumbuka: ambapo ni jina la mtumiaji la Linux la kila mtumiaji.
Vinginevyo, unaweza kuunda saraka ya nyumbani kama ifuatavyo.
sudo -u hdfs hadoop fs -mkdir /user/$USER sudo -u hdfs hadoop fs -chown $USER /user/$USER
Fungua kivinjari chako na uandike url kama http://ip_address_of_namenode:50070 ili kufikia Namenode.
Fungua kichupo kingine katika kivinjari chako na uandike url kama http://ip_address_of_jobtracker:50030 ili kufikia JobTracker.
Utaratibu huu umejaribiwa kwa ufanisi kwenye RHEL/CentOS 5.X/6.X. Tafadhali toa maoni hapa chini ikiwa unakabiliwa na maswala yoyote na usakinishaji, nitakusaidia na suluhisho.