Lýsing:
Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.
Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.
Annað
- Höfundur: Dirk deRoos
- Útgáfa:1
- Útgáfudagur: 2014-03-21
- Blaðsíður: 408
- Engar takmarkanir á útprentun
- Engar takmarkanir afritun
- Format:ePub
- ISBN 13: 9781118652206
- Print ISBN: 9781118607558
- ISBN 10: 1118652207
Efnisyfirlit
- Introduction
- About this Book
- Foolish Assumptions
- How This Book Is Organized
- Part I: Getting Started With Hadoop
- Part II: How Hadoop Works
- Part III: Hadoop and Structured Data
- Part IV: Administering and Configuring Hadoop
- Part V: The Part Of Tens: Getting More Out of Your Hadoop Cluster
- Icons Used in This Book
- Beyond the Book
- Where to Go from Here
- Part I: Getting Started with Hadoop
- Chapter 1: Introducing Hadoop and Seeing What It’s Good For
- Big Data and the Need for Hadoop
- Exploding data volumes
- Varying data structures
- A playground for data scientists
- The Origin and Design of Hadoop
- Distributed processing with MapReduce
- Apache Hadoop ecosystem
- Examining the Various Hadoop Offerings
- Comparing distributions
- Working with in-database MapReduce
- Looking at the Hadoop toolbox
- Big Data and the Need for Hadoop
- Chapter 1: Introducing Hadoop and Seeing What It’s Good For
- Chapter 2: Common Use Cases for Big Data in Hadoop
- The Keys to Successfully Adopting Hadoop (Or, “Please, Can We Keep Him?”)
- Log Data Analysis
- Data Warehouse Modernization
- Fraud Detection
- Risk Modeling
- Social Sentiment Analysis
- Image Classification
- Graph Analysis
- To Infinity and Beyond
- Chapter 3: Setting Up Your Hadoop Environment
- Choosing a Hadoop Distribution
- Choosing a Hadoop Cluster Architecture
- Pseudo-distributed mode (single node)
- Fully distributed mode (a cluster of nodes)
- The Hadoop For Dummies Environment
- The Hadoop For Dummies distribution: Apache Bigtop
- Setting up the Hadoop For Dummies environment
- The Hadoop For Dummies Sample Data Set: Airline on-time performance
- Your First Hadoop Program: Hello Hadoop!
- Chapter 4: Storing Data in Hadoop: The Hadoop Distributed File System
- Data Storage in HDFS
- Taking a closer look at data blocks
- Replicating data blocks
- Slave node and disk failures
- Sketching Out the HDFS Architecture
- Looking at slave nodes
- Keeping track of data blocks with NameNode
- Checkpointing updates
- HDFS Federation
- HDFS High Availability
- Data Storage in HDFS
- Chapter 5: Reading and Writing Data
- Compressing Data
- Managing Files with the Hadoop File System Commands
- Ingesting Log Data with Flume
- Chapter 6: MapReduce Programming
- Thinking in Parallel
- Seeing the Importance of MapReduce
- Doing Things in Parallel: Breaking Big Problems into Many Bite-Size Pieces
- Looking at MapReduce application flow
- Understanding input splits
- Seeing how key/value pairs fit into the MapReduce application flow
- Writing MapReduce Applications
- Getting Your Feet Wet: Writing a Simple MapReduce Application
- The FlightsByCarrier driver application
- The FlightsByCarrier mapper
- The FlightsByCarrier reducer
- Running the FlightsByCarrier application
- Running Applications Before Hadoop 2
- Tracking JobTracker
- Tracking TaskTracker
- Launching a MapReduce application
- Seeing a World beyond MapReduce
- Scouting out the YARN architecture
- Launching a YARN-based application
- Real-Time and Streaming Applications
- Admiring the Pig Architecture
- Going with the Pig Latin Application Flow
- Working through the ABCs of Pig Latin
- Uncovering Pig Latin structures
- Looking at Pig data types and syntax
- Evaluating Local and Distributed Modes of Running Pig scripts
- Checking Out the Pig Script Interfaces
- Scripting with Pig Latin
- Pumping Up Your Statistical Analysis
- The limitations of sampling
- Factors that increase the scale of statistical analysis
- Running statistical models in MapReduce
- Machine Learning with Mahout
- Collaborative filtering
- Clustering
- Classifications
- R on Hadoop
- The R language
- Hadoop Integration with R
- Getting Oozie in Place
- Developing and Running an Oozie Workflow
- Writing Oozie workflow definitions
- Configuring Oozie workflows
- Running Oozie workflows
- Scheduling and Coordinating Oozie Workflows
- Time-based scheduling for Oozie coordinator jobs
- Time and data availability-based scheduling for Oozie coordinator jobs
- Running Oozie coordinator jobs
- Chapter 11: Hadoop and the Data Warehouse: Friends or Foes?
- Comparing and Contrasting Hadoop with Relational Databases
- NoSQL data stores
- ACID versus BASE data stores
- Structured data storage and processing in Hadoop
- Modernizing the Warehouse with Hadoop
- The landing zone
- A queryable archive of cold warehouse data
- Hadoop as a data preprocessing engine
- Data discovery and sandboxes
- Comparing and Contrasting Hadoop with Relational Databases
- Say Hello to HBase
- Sparse
- It’s distributed and persistent
- It has a multidimensional sorted map
- Understanding the HBase Data Model
- Understanding the HBase Architecture
- RegionServers
- MasterServer
- Zookeeper and HBase reliability
- Taking HBase for a Test Run
- Creating a table
- Working with Zookeeper
- Getting Things Done with HBase
- Working with an HBase Java API client example
- HBase and the RDBMS world
- Knowing when HBase makes sense for you?
- ACID Properties in HBase
- Transitioning from an RDBMS model to HBase
- Deploying and Tuning HBase
- Hardware requirements
- Deployment Considerations
- Tuning prerequisites
- Understanding your data access patterns
- Pre-Splitting your regions
- The importance of row key design
- Tuning major compactions
- Saying Hello to Hive
- Seeing How the Hive is Put Together
- Getting Started with Apache Hive
- Examining the Hive Clients
- The Hive CLI client
- The web browser as Hive client
- SQuirreL as Hive client with the JDBC Driver
- Working with Hive Data Types
- Creating and Managing Databases and Tables
- Managing Hive databases
- Creating and managing tables with Hive
- Seeing How the Hive Data Manipulation Language Works
- LOAD DATA examples
- INSERT examples
- Create Table As Select (CTAS) examples
- Querying and Analyzing Data
- Joining tables with Hive
- Improving your Hive queries with indexes
- Windowing in HiveQL
- Other key HiveQL features
- The Principles of Sqoop Design
- Scooping Up Data with Sqoop
- Connectors and Drivers
- Importing Data with Sqoop
- Importing data into HDFS
- Importing data into Hive
- Importing data into HBase
- Importing incrementally
- Benefiting from additional Sqoop import features
- Sending Data Elsewhere with Sqoop
- Exporting data from HDFS
- Sqoop exports using the Insert approach
- Sqoop exports using the Update and Update Insert approach
- Sqoop exports using call stored procedures
- Sqoop exports and transactions
- Looking at Your Sqoop Input and Output Formatting Options
- Getting down to brass tacks: An example of output line-formatting and input-parsing
- Sqoop 2.0 Preview
- SQL’s Importance for Hadoop
- Looking at What SQL Access Actually Means
- SQL Access and Apache Hive
- Solutions Inspired by Google Dremel
- Apache Drill
- Cloudera Impala
- IBM Big SQL
- Pivotal HAWQ
- Hadapt
- The SQL Access Big Picture
- Chapter 16: Deploying Hadoop
- Working with Hadoop Cluster Components
- Rack considerations
- Master nodes
- Slave nodes
- Edge nodes
- Networking
- Hadoop Cluster Configurations
- Small
- Medium
- Large
- Alternate Deployment Form Factors
- Virtualized servers
- Cloud deployments
- Sizing Your Hadoop Cluster
- Working with Hadoop Cluster Components
- Chapter 17: Administering Your Hadoop Cluster
- Achieving Balance: A Big Factor in Cluster Health
- Mastering the Hadoop Administration Commands
- Understanding Factors for Performance
- Hardware
- MapReduce
- Benchmarking
- Tolerating Faults and Data Reliability
- Putting Apache Hadoop’s Capacity Scheduler to Good Use
- Setting Security: The Kerberos Protocol
- Expanding Your Toolset Options
- Hue
- Ambari
- Hadoop User Experience (Hue)
- The Hadoop shell
- Basic Hadoop Configuration Details
- Chapter 18: Ten Hadoop Resources Worthy of a Bookmark
- Central Nervous System: Apache.org
- Tweet This
- Hortonworks University
- Cloudera University
- BigDataUniversity.com
- planet Big Data Blog Aggregator
- Quora’s Apache Hadoop Forum
- The IBM Big Data Hub
- Conferences Not to Be Missed
- The Google Papers That Started It All
- The Bonus Resource: What Did We Ever Do B.G.?
- Chapter 19: Ten Reasons to Adopt Hadoop
- Hadoop Is Relatively Inexpensive
- Hadoop Has an Active Open Source Community
- Hadoop Is Being Widely Adopted in Every Industry
- Hadoop Can Easily Scale Out As Your Data Grows
- Traditional Tools Are Integrating with Hadoop
- Hadoop Can Store Data in Any Format
- Hadoop Is Designed to Run Complex Analytics
- Hadoop Can Process a Full Data Set (As Opposed to Sampling)
- Hardware Is Being Optimized for Hadoop
- Hadoop Can Increasingly Handle Flexible Workloads (No Longer Just Batch)
- About the Authors
- Cheat Sheet
- More Dummies Products
UM RAFBÆKUR Á HEIMKAUP.IS
Bókahillan þín er þitt svæði og þar eru bækurnar þínar geymdar. Þú kemst í bókahilluna þína hvar og hvenær sem er í tölvu eða snjalltæki. Einfalt og þægilegt!Rafbók til eignar
Rafbók til eignar þarf að hlaða niður á þau tæki sem þú vilt nota innan eins árs frá því bókin er keypt.
Þú kemst í bækurnar hvar sem er
Þú getur nálgast allar raf(skóla)bækurnar þínar á einu augabragði, hvar og hvenær sem er í bókahillunni þinni. Engin taska, enginn kyndill og ekkert vesen (hvað þá yfirvigt).
Auðvelt að fletta og leita
Þú getur flakkað milli síðna og kafla eins og þér hentar best og farið beint í ákveðna kafla úr efnisyfirlitinu. Í leitinni finnur þú orð, kafla eða síður í einum smelli.
Glósur og yfirstrikanir
Þú getur auðkennt textabrot með mismunandi litum og skrifað glósur að vild í rafbókina. Þú getur jafnvel séð glósur og yfirstrikanir hjá bekkjarsystkinum og kennara ef þeir leyfa það. Allt á einum stað.
Hvað viltu sjá? / Þú ræður hvernig síðan lítur út
Þú lagar síðuna að þínum þörfum. Stækkaðu eða minnkaðu myndir og texta með multi-level zoom til að sjá síðuna eins og þér hentar best í þínu námi.
Fleiri góðir kostir
- Þú getur prentað síður úr bókinni (innan þeirra marka sem útgefandinn setur)
- Möguleiki á tengingu við annað stafrænt og gagnvirkt efni, svo sem myndbönd eða spurningar úr efninu
- Auðvelt að afrita og líma efni/texta fyrir t.d. heimaverkefni eða ritgerðir
- Styður tækni sem hjálpar nemendum með sjón- eða heyrnarskerðingu
- Gerð : 208
- Höfundur : 10535
- Útgáfuár : 2014
- Leyfi : 380