Skip to content

QuestionAnswered step-by-step??For this lab,Lab 1 : MapReduc

Do you have a similar question? Our professional writers have done a similar paper in past. Give Us your instructions and wait for a professional assignment!        

QuestionAnswered step-by-step??For this lab,Lab 1 : MapReduce1. You will investigate how MapReduce program works with real example in both our HU cloud platform and MapR Sandbox.2. You will compare the execution time for the MapReduce program between a single node (MapR Sandbox) and multiple nodes (HU cloud platform)* Most contents are coming from https://learn.mapr.com/ by the permission of MapR technology.Prerequisite:To our HU cloud platform,For Hadoop Cluster Overview,http://hdfs-namenode-hadoop.apps.myhu.cloud/dfshealth.html#tab-overviewFor accessing Hadoop cluster nodes,https://master1.myhu.cloud:8443/console/project/hadoop/browse/podsUser Name : hadoop, passwd: hadoopTo access the terminal of name node, please, click “hdfs-namenode-0” and then click “Terminal”.1. Create the folder named with your student id and work under the folder created.2. To upload file, please use “wget” or other commands you like.3. If you have any problem/issue with the HU cloud, please report it to your submission and work only with MapR sandbox.For using MapR sandbox,Please, download the one of the MapR sandboxes listed below.?VMware Course Sandbox: http://package.mapr.com/releases/v5.1.0/sandbox/MapR-Sandbox-For-Hadoop-5.1.0-vmware.ova?VirtualBox Course Sandbox: http://package.mapr.com/releases/v5.1.0/sandbox/MapR-Sandbox-For-Hadoop-5.1.0.ovaFor the installation, please refer to https://mapr.com/docs/52/SandboxHadoop/c_sandbox_overview.htmlLogging in to the Command Line? Before you get started, you’ll want to have the IP address handy for your Sandbox VM. See the screenshot below for an example of where to find that.? Next, use an SSH client such as Putty (Windows) or Terminal (Mac) to login. See below for an example:? use userid: user01 and password: mapr.If you have a permission issue, please, use mapr/mapr or root/mapr.? ? For VMWare use: $ ssh user01@ipaddress ? For Virtualbox use: $ ssh u..1@127.0.0.1 -p 2222Task 1 : Introduction to MapReduceLab OverviewIn the lesson’s lab exercises, you will run a few MapReduce jobs from the command line and examine job information in the MapR Control System (MCS).Note: Some commands shown throughout this lab guide are too long to fit on a single line.The backslash character () indicates that the command continues on the next line. Do notinclude the backslash character, or a carriage return, when typing the commands.Run wordcountEstimated time to complete: 20 minutesRun wordcount against a text file1. Log into the cluster as the user user01.2. Createa directory in your home directory as follows:$ mkdir /mapr//user/user01/Lab1.3Note: In this and subsequent commands that include the designator,replace with the actual name of your cluster. For example,/mapr/maprdemo. The /mapr/ prefix indicates where the cluster filesystem is mounted with Direct Access NFS? which makes it possible to usestandard Linux command to access the cluster file system.3. Createa file in that directory as follows:$ echo “Hello world! Hello” > /mapr//user/user01/Lab1.3/in.txt4. Run the MRv1 version of the wordcount application against the input file.$ hadoop2 jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /user/user01/Lab1.3/in.txt /user/user01/Lab1.3/OUTNote: With hadoop commands such as this, you do not need to include the prefix/mapr/, since you are dealing directly with the cluster file system and notusing Direct Access NFS?5. Check the output of the wordcount application:$ cat /mapr//user/user01/Lab1.3/OUT/part-r-00000Submit the screen capture of this output.Run wordcount against a set of text files1. Createa directory in your home directory, and also creat a set of text files as follows:$ mkdir -p /mapr//user/user01/Lab1.3/IN2$ cp /etc/*.conf /mapr//user/user01/Lab1.3/IN2 2>/dev/null2. Determine how many files are in that directory:$ ls /mapr//user/user01/Lab1.3/IN2 | wc -l3. Run the MRv2 version of the wordcount application against the directory:$ hadoop2 jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1602.jar wordcount /user/user01/Lab1.3/IN2 /user/user01/Lab1.3/OUT24. Check the output of the wordcount application:$ wc -l /mapr//user/user01/Lab1.3/OUT2/part-r-00000$ more /mapr//user/user01/Lab1.3/OUT2/part-r-00000Submit the screen capture of this output.Run wordcount Against a Binary File1. Createa directory in your home directory as follows:$ mkdir -p /mapr//user/user01/Lab1.3/IN32. Creat a binary file in that directory as follows:$ cp /bin/cp /mapr//user/user01/Lab1.3/IN3/mybinary3. Verify the file is a binary:$ file /mapr//user/user01/Lab1.3/IN3/mybinary4. See if there is any readable text in the binary:$ strings /mapr//user/user01/Lab1.3/IN3/mybinary | more5. Run the MRv1 version of the wordcount application (using the MRv2 client) against the inputfile. This will show that binaries compiled for MRv1 will run in a MRv2 framework.$ hadoop2 jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /user/user01/Lab1.3/IN3/mybinary /user/user01/Lab1.3/OUT36. Check the output of the wordcount application:$ more /mapr//user/user01/Lab1.3/OUT3/part-r-000007. Cross-reference the frequency of the “word” ATUH in the binary and in the wordcount output:$ strings /mapr//user/user01/Lab1.3/IN3/mybinary | grep -c ATUH$ egrep -ac ATUH /mapr//user/user01/Lab1.3/OUT3/part-r-00000Submit the screen capture of this output.Run wordcount against a set of text files via HU cloud environment.- Please, use same set of text files- Compare execution time between on MapR sandbox and on HU cloud environment by providing the screen captures.- If you have any issue/problem with HU cloud environment, please report it.Write MapReduce ProgramsLab OverviewThe lab for this lesson covers how to make some modifications to an existing MapReduce program, compile it, run it, and examine the output. The existing code calculates minimum and maximum values in the data set. You will modify the code to calculate the mean surplus or deficit. The data set we’re using is the history of the United States federal budget from the year 1901 to 2012.The data was downloaded from the white house website and has been massaged for this exercise. The existing code calculates minimum and maximum values in the data set. You will modify the code to calculate the mean surplus or deficit.Here is a sample record from the data set:1968 152973 178134 -25161 128056 155798 -27742 24917 22336 2581The fields of interest in this exercise are the first and fourth fields (year and surplus or deficit). The second field is the total income derived from federal income taxes, and the third field is the expenditures for that year. The fourth field is the difference between the second and third fields. A negative value in the fourth field indicates a budget deficit and a positive value indicates a budget surplus.Modify a MapReduce ProgramCopy the Lab Files1. Log into the cluster as user01.2. Createdirectory for the lab work, and position yourself in that directory:$ mkdir /mapr//user/user01/Lab3$ cd /mapr//user/user01/Lab33. Download and unzip the source code for the lab:$ wget http://course-files.mapr.com/DEV3000/DEV300-v5.1-Lab3.zip$ unzip DEV300-v5.1-Lab3.zipThis will create two directories: RECEIPTS_LAB, which contains the source files for the lab, andRECEIPTS_SOLUTION which contains files with the solution correctly implemented. You canreview solutions files as needed for help completing the lab.Modify Code in the Driver1. Change directory into the RECEIPTS_LAB directory.$ cd RECEIPTS_LAB2. Open the ReceiptsDriver.java source file with your favorite text editor.$ vi ReceiptsDriver.java3. Look for the string // TODO in the file, and follow the instructions to make the necessarychanges.4. Save the ReceiptsDriver.java file.Compile and Run the Map-only MapReduce Program1. Execute the rebuild.sh script to compile your code.$ ./rebuild.sh2. Execute the rerun.sh script to run your code.$ ./rerun.sh3. Examine the output from your MapReduce job. Note you may need to wait a minute before thejob output is completely written to the output files.$ cat /mapr//user/user01/Lab3/RECEIPTS_LAB/OUT/part*Submit the screen capture of this output.Here is partial output expected for this exercise:summary 1901_63summary 1902_77summary 1903_45summary 1904_-43summary 1905_-23summary 1906_25summary 1907_87summary 1908_-57summary 1909_-89summary 1910_-18summary 1911_11Once you obtain the correct intermediate results from the maponly code, proceed to the next section.Implement Code in the Reducer In thiexercise, you will implement code in the reducer to calculate the mean value. The code has already been provided to calculate minimum and maximum values. Recall that the mapper code you ran above will produce intermediate results. One such record looks like this:summary 1968_-25161When you execute the code for this lab, there will only be one reducer (since there is only one key -“summary”). That reducer will iterate over all the intermediate results and pull out the year and surplus or deficit. Your reducer will keep track of the minimum and maximum values (as temp variables) as well as the year those values occurred. You will also need to keep track of the sum of the surplus or deficit and count of the records in order to calculate the mean value.1. Open the ReceiptsReducer.java source file with your favorite text editor.$ vi ReceiptsReducer.java2. Find the // TODO statements in the file, and make the changes indicated. Refer to the solutionsfile as needed for help.3. Save the ReceiptsReducer.java file.4. Open the ReceiptsDriver.java source file with your favorite text editor. Find the line //TODO comment out the Reducer class definition. Recall that in the previous section,you commented out the Reducer definition – in this section, you will need to uncomment it so it will be included again.5. Save the ReceiptsDriver.java file.Compile and Run Your Code1. Execute the rebuild.sh script to compile your code.$ ./rebuild.sh2. Execute the rerun.sh script to run your code.$ ./rerun.sh3. Examine the output from your MapReduce job.$ cat /mapr//user/user01/Lab3/RECEIPTS_LAB/OUT/part*Submit the screen capture of this output.Here is the output expected for this exercise:min(2009): -1412688.0max(2000): 236241.0mean: -93862.0Compile and run your code via HU cloud environment.- Compare execution time between on MapR sandbox and on HU cloud environment by providing the screen captures.- If you have any issue/problem with HU cloud environment, please report it to your submission.- The example screen capture is below.Image transcription textAn ultrasound image is created bytransmitting sound waves into the bodyand interpreting the intens… Show moreImage transcription text1. List five processing that can be doneon digital audio but not on analog audio2. Digital audio processing… Show moreImage transcription textAdditional questions: Waves are usedto image objects, for example: *Weimage objects with light w… Show moreImage transcription text1. Give an algorithm to sort 4 elementsin 5 comparisons. Prove that thisalgorithm is optimal. 2. Gi… Show moreEngineering & TechnologyComputer ScienceCOMP 61411Share Question

Get a plagiarism-free order today   we guarantee confidentiality and a professional paper and we will meet the deadline.    

Leave a Reply

Order a plagiarism free paper today. Get 20% off your first order!

X