Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Smoke Signals Coming From Your Hadoop Cluster

by Sean Suchter
February 8, 2016
in Data Science
Home Topics Data Science
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

As Hadoop gains traction among companies of all sizes, many are discovering that getting a cluster to run optimally is a daunting task. In fact, it’s impossible for any human to respond in real time to all the changing conditions across multiple nodes to fix problems causing bottlenecks or performance dips. Yet it’s just this performance degradation that’s critical to confront, especially for large-scale deployments. After all, if your cluster doesn’t run smoothly and efficiently, you can’t count on Hadoop to deliver business-critical results on time, which leads to wasted resources in both time and funds. So what should you be paying attention to in ensuring your clusters are operating at the highest capacity? Here are three warning signs to keep in mind.

Table of Contents

  • Warning #1: You think you’re out of capacity
  • Warning #2: Your high-priority jobs are not finishing on time
  • Warning #3: Your cluster grinds to a halt periodically
  • Seeing the writing on the wall before it’s written

Warning #1: You think you’re out of capacity

Most companies put considerable effort into capacity planning when designing a Hadoop deployment. You probably made painstaking calculations to ensure that enough resources — specifically CPU, memory, network throughput, and disk I/O and storage — were provisioned for your cluster’s anticipated workload. But once a cluster is brought online, the true litmus test is whether all your jobs run efficiently and complete on time. Yet sometimes it may appear you’re out of capacity when you know you’re not, because when you try to run more applications, you can’t.

Naturally, you start by using Ganglia or some other monitoring tool to root through various cluster metrics, looking for anomalies. You might check CPU usage but find your processors aren’t even close to being 100% utilized. Your 10Gb network is peaking at only 50Mb — so that’s not the problem. What else can you do?

While most monitoring tools can show that your network is busy, they can’t always show you why it’s busy. This is because either the tool doesn’t give sufficiently granular detail into the inner workings of your cluster’s activities, or because certain areas to troubleshoot (e.g. Hadoop configuration settings) aren’t normally flagged in these types of tools. The first challenge in these instances is to identify the root cause of your problem.


Join the Partisia Blockchain Hackathon, design the future, gain new skills, and win!


The issue can often be traced to the YARN architecture. In fact, Hadoop clusters are frequently YARN-bound, meaning that the way the YARN software operates binds their performance.

YARN considers units of work (containers) to allocate to a node, and is aware of whether the container completes or not. When jobs are scheduled, YARN statically assigns node resources, so that once jobs are running, no further resource adjustments are made on these containers. As a result, YARN can’t react quickly to changing conditions, and it must be configured to accommodate worst-case scenarios. In fact, it’s very likely that you have unused resources tied up in these statically assigned containers — resources that the jobs will never actually use.

Warning #2: Your high-priority jobs are not finishing on time

Not all jobs run on a cluster are of equal importance. This is especially true in multi-tenant, multi-workload situations where several disparate cluster users are trying to run different applications simultaneously. If there are critical jobs that have a finite window of time to complete, you’ll want to ensure they meet their deadlines. But what if one of these high-priority jobs is suddenly taking too long to finish and missing its SLA?

You could start by checking whether a parameter or configuration setting has recently changed. Barring this, you can email other cluster users to see if they’ve recently changed their applications or settings in a way that’s impacting overall cluster performance. This is a time-consuming approach, though, and prone to inadequate disclosure by end users. In any case, it’s quite likely that resource contention between low- and high-priority jobs prevented the critical applications from finishing on time, but up-front planning and tuning often cannot prevent this kind of resource contention.

Warning #3: Your cluster grinds to a halt periodically

For this warning sign, let’s look at a common, real-life example: on a multi-tenant cluster used by hundreds of developers, you notice the cluster nearly grinds to a halt on a regular basis. You see heavy disk usage but can’t identify the root cause without a visualization tool that operates on the right input data.

You can use a node-monitoring tool (such as Ganglia or Cloudera Manager) that will show the disks getting busy. But these tools cannot explain why the disks are busy. The main drawback is that node monitoring tools cannot give you visibility down to the task-, user-, or job-level as your applications run — they merely provide node-level summaries.

To isolate the cause of the problem using these traditional node-monitoring tools, you could log into nodes and use a tool such as iostat to monitor every process that has significant disk usage. But you would have to know exactly when to anticipate the problem to detect the spike in disk usage with this method. This is impossible to do if you rely on human interaction alone; technology must play a part.

Seeing the writing on the wall before it’s written

In each of these cases, the warning signs of faulty performance were difficult — or impossible — to troubleshoot with human intervention alone, especially without the proper view into cluster activity. Since symptoms can be misleading, the result is many wasted man-hours spent experimenting with various remedies. A more efficient solution is to invest in tools that can make corrections automatically at the first sign of a contention problem, even when jobs are running.

We’ll continue to see continuing proliferation and adoption of Hadoop across companies of all sizes and industries. Unfortunately, however, human ability alone is not sufficient to guarantee optimally running clusters. To maximize the value of your Hadoop deployment you need the ability to anticipate, react quickly and make decisions in real time. Pay close attention to these three warning signs to help pinpoint areas for improvement.

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Tags: CPUHadoopPepperdataTroubleshoot

Related Posts

What is ChatGPT Plus, and how to get it? Learn its features, price, and how to join ChatGPT Plus waitlist. Is it worth it? Keep reading and find out

ChatGPT Plus: How does the paid version work?

February 2, 2023
AI Text Classifier: OpenAI's ChatGPT detector can distinguishes AI-generated text

AI Text Classifier: OpenAI’s ChatGPT detector indicates AI-generated text

February 2, 2023
BuzzFeed ChatGPT integration: Buzzfeed stock surges in enthusiasm over OpenAI

BuzzFeed ChatGPT integration: Buzzfeed stock surges after the OpenAI deal

February 2, 2023
Adversarial machine learning 101: A new frontier in cybersecurity

Adversarial machine learning 101: A new cybersecurity frontier

January 31, 2023
What is the Nvidia Eye Contact AI feature? Learn how to get and use the new Nvidia Broadcast feature. Zoom meetings and streams get easier.

Nvidia Eye Contact AI can be the savior of your online meetings

February 2, 2023
How did ChatGPT passed an MBA exam

How did ChatGPT passed an MBA exam?

February 2, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Cyberpsychology: The psychological underpinnings of cybersecurity risks

ChatGPT Plus: How does the paid version work?

AI Text Classifier: OpenAI’s ChatGPT detector indicates AI-generated text

A journey worth taking: Shifting from BPM to DPA

BuzzFeed ChatGPT integration: Buzzfeed stock surges after the OpenAI deal

Adversarial machine learning 101: A new cybersecurity frontier

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.