Technology Outages ITOps professionals like to prevent

Outages ITOps professionals like to prevent

-

Watch the Low-Code/No-Code Summit on-demand sessions to learn how to successfully innovate and achieve efficiencies by upskilling and scaling citizen developers. Watch now.


As we settle into the time of year when we reflect on what we’re thankful for, we tend to focus on important basic things like health, family, and friends.

But on a professional level, IT professionals (ITOps) are thankful that they can avoid catastrophic outages that can cause confusion, frustration, lost revenue, and damaged reputations. The terribly last thing ITOps, Network Operations Center (NOC) or Site Reliability Engineering (SRE) teams want while eating their turkey and enjoying time with family is to be called out about an outage. These can be extremely expensive – $12,913 per minutein fact, and up to $1.5 million per hour for larger organizations.

However, to understand the peace of mind that comes with avoiding downtime, you must have experienced the pain and anxiety associated with downtime firsthand. Here are a handful of horror stories ITOps pros are eager to avoid this season.

A case of janky chain of command

An experienced IT professional was on duty with three others when it was 7 p.m. The crew received an alert about a problem with the front-end user interface of the Global Traffic Manager appliance. Fortunately, there was a runbook for it in a database, so it looked like the problem would be solved soon. One of the team members saw two things to type in: a command and a secondary input. He typed in the commands and, based on what the runbook looked like, waited for the command line to ask for input, such as “what do you want to reboot?”

Event

Intelligent security stop

On December 8, learn about the critical role of AI and ML in cybersecurity and industry-specific case studies. Register for your free pass today.

register now

The way the chain of command was set up, if you didn’t give any input, the device would reboot itself. He typed what he thought was the right command – “bigstart, restart” – and the entire front-end global traffic manager was removed.

As a reminder, this took place in the early evening. The client was a finance company and the system went down just around the time companies were closing and trying to do their accounting and other financial tasks. Terrible timing, to say the least.

Five minutes after the outage, the ITOps team realized what had happened: the tool they were using for their runbook used text wrapping by default, so what appeared to be two separate commands was actually just one. Although the outage was relatively brief, it came at a critical time and set off a chain reaction of headaches. Lesson learned? Make sure your job structure is optimized.

When Google is your best friend in the middle of the night

For a 15+ year IT veteran, what seemed like a quiet night shift quickly turned into a terrifying nightmare. “I never panicked as quickly as when the remote terminal I was in suddenly went out,” he said.

What he was trying to do was restart a service while working on a remote computer, but accidentally disabled the network connector in the process. Calling someone and waking them up in the middle of the night to tell them that they had “destroyed” a network adapter wasn’t ideal, so he and his teammates did some digging.

After what he calls “not an insignificant amount of Googling”, he was able to find his way to a Dell server and reboot the network adapter from there. It took longer than necessary to solve the problem, but the problem has finally been solved.

His pro tip: “Don’t disable the network adapter on a computer you’re remotely controlling in the middle of the night.” That may sound obvious, but the underlying lesson is to have a contingency plan in case something goes horribly wrong.

ITOps: Leaning on email used to be great – until it wasn’t anymore

When email was the primary way NOC teams received alerts, a longtime IT pro recalls having a teammate whose only job was essentially dispatch: checking emails and creating tickets for incidents that now needed attention, and others for those they could turn to later. The system worked well, but it was actually a time bomb waiting to explode, as this was a large multinational company.

That fear became reality when the company’s entire data center went down.

This was a series of problems in itself, but the incident generated so many email alerts that it also crashed the corporate Outlook server. “At that point you are really blind,” recalls this IT hero.

The event happened to happen in the middle of the night, so the team on duty reluctantly had to wake up fellow teammates. After the problem was finally solved, the team developed a sense of humor about it. As they recalled, “We used to joke that we did DDoS ourselves with our own alert sound. Good times!”

Ultimately, the overarching moral of the story is this: every time a hand touches a keyboard, there’s a risk that something could go wrong. Of course, this is sometimes unavoidable, but teams that can automate and simplify their IT processes as much as possible are giving themselves the best chance of avoiding costly outages so they can enjoy their Thanksgiving feasts undisturbed.

Mohan Kompella is vice president of product marketing at BigPanda.

Data decision makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

To read about advanced ideas and up-to-date information, best practices and the future of data and data technology, join DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

Shreya Christinahttp://ukbusinessupdates.com
Shreya has been with ukbusinessupdates.com for 3 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider ukbusinessupdates.com team, Shreya seeks to understand an audience before creating memorable, persuasive copy.

Latest news

1xbet App ᐉ Скачать 1xbet Mobile 1xbet Apk Android & Ios ᐉ My 1xbet Co

1xbet App ᐉ Скачать 1xbet Mobile 1xbet Apk Android & Ios ᐉ My 1xbet Com1xbet Официальное Приложение Скачать и...

Вулкан Вегас официальному Сайт: Автоматы в Деньги В Vulkan Vega

Вулкан Вегас официальному Сайт: Автоматы в Деньги В Vulkan VegasЛучшие Сайты Онлайн-слотов В 2024 году Игры На Игровые Автоматы...

Comment jouer au RDR2 Poker Un guide pour gagner au RDR2 Poker

Fort heureusement, vous pouvez sauvegarder entre chaque parties gagnées et quitter la table en cours de partie dans modifier...

comment ouvrir un casino 653756

Elle garantit que le casino opère selon des normes établies pour protéger les joueurs, garantir des jeux équitables et...

Royal Ace Casino Review Updated for April 2024

Nous sommes un annuaire indépendant et un réviseur de casinos en ligne, un forum sur les casinos et un...

Red Dead Redemption 2, comment tricher au poker

Lorsque vous jouez contre des joueurs expérimentés, cela les empêche d'apprendre votre style et de prédire vos décisions. Une...

Must read

You might also likeRELATED
Recommended to you