Results

Atlassian Outage - Thoughts on What to Do When Your Provider Goes Down

[With comments from Holger Mueller]

Update: 4/29/2022

Atlassian hired a new CTO - Rajeev Rajan from Meta (ex-Microsoft). While this itself is not the complete answer, this is a solid first step by getting someone with a strong enterprise engineering team background to address some of their issues. As I stated in my conclusion of this blog earlier, "Atlassian cloud has growing pains." Hope he is their answer, and hope they will continue to address the issue at hand to instill confidence in their customers. 

--------------------------------------------------------

The latest Atlassian outage goes to show that every cloud provider is prone to unplanned downtime sooner or later. While every company strives to achieve that unicorn status of zero downtime, it is almost impossible to achieve that in the face of “Unknown Unknowns.” Especially with the need and demand for “always-on,” there are more opportunities than ever for things to break, and incidents do not wait for a convenient time.

What actually happened?

On April 4th, a small portion of Atlassian customers (400 of the 226,000 ish customers) experienced an outage on a number of Atlassian Cloud sites for Jira Software, Jira Work Management, Jira Service Management, Confluence, Opsgenie, Statuspage, and Atlassian Access. While the number of customers affected was low, those customers lost complete access to all of their Atlassian services. And the actual number of affected could be in the 100s of thousands of actual users. If the enterprise depends on the Atlassian cloud suite for their DevOps, Enterprise Service, or Incident Management, they were at a standstill until the issues were resolved. (The issues were finally resolved on 4/18/22, after 2 weeks, per Atlassian).

This outage that originally started on April 4th took almost 2 weeks now for some customers. The timing couldn’t have been any worse, with their executive team pitching how great their cloud services are and how they are putting an enterprise sales/service model for large enterprise customer customers at their Atlassian Team ’22 conference in Vegas during April 5-7. All this was happening while the Atlassian executives were on stage talking about how they are building a resilient cloud bar none.

Why is it bad?

This outage from Atlassian is pretty bad for a few reasons.

  1. They were putting on a big show in Vegas, where they were talking about their strategy, direction, vision, and mission for their cloud services when their cloud services were down. I was attending the conference in person, and some of my tweets and many others’ tweets were met with hostile customer replies with angry comments. Bad publicity.

  2. Atlassian as a company has a primary solution set that is mainly focused on helping customers prepare for such unplanned outages. Agile development, DevOps cycle, issue/bug tracking, incident management, ITSM, incident response, Statuspage, etc.

  3. The issue is self-inflicted. The damage was not due to misconfiguration, hacking, or affected by other provider-dependent services. As Atlassian CTO stated, two critical errors were committed. First, instead of deactivating a specific app, the entire cloud site for certain customers with all apps was deactivated due to a communication gap. Second, the scripts deleted the entire apps “permanently” vs “temporarily recoverable” for compliance reasons. The combination of these two errors led to the colossal mishap.

  4. They took a long time to respond with something meaningful. Granted, they were all busy with the big show in Vegas. Atlassian claims they figured out the issue and root cause within hours, but the cryptic messaging to the customers and their status pages were not very clear on the situation. Until then, only the cryptic message of “your service is down” with no ETA was relayed to the affected customers. Only about a week later, Atlassian CTO wrote a detailed post on what went wrong. Until then, customers were scrambling to figure out what went wrong and were doing patchwork with spreadsheets, Word docs, and other collaboration tools like Slack to manage the gap.

  5. Atlassian announced that they will no longer sell new licenses for server on-prem installations (but will still continue their Data Center offerings) and will discontinue support for on-prem server in 2024 for all existing customers effectively forcing the server version of the customers to move to the cloud in 2 years. Which makes sense on their part only to maintain only SaaS and Data Center versions instead of having a fragmented solution set.

  6. Incidentally, Atlassian CTO stated in his last blog (before this fiasco) "At our engineering Town Hall meeting, I announced that we were in a "Five-Alarm Fire" due to poor reliability and cloud operations. Our customers needed to trust that we could provide the next level of reliability, security, and operational maturity to support our business transition to the cloud in the coming years." and that he is raising a “Five alarm fire” to fix that. That particularly called for reliability, security, and operational maturity to support the transition to the cloud. With this incident, they failed in two of those three categories, unfortunately.

  7. Finally, more importantly, Atlassian claims an SLA of 99.99% for Premium and 99.95% for enterprise cloud products. They also claim a 6-hour RTO (recovery time objective) for tier 1 customers. Unfortunately, neither held up this time.

Why it happened?

Rather than me trying to paraphrase, you can read Atlassian CTO’s blog that explains what happened in detail here.

What now?

Accept that no cloud service is invincible. High-profile outages are becoming more and more common. Even the mighty AWS had their US-East region down for many hours recently. Unplanned downtimes are expected and will happen at the most unfortunate time – holidays, nights, weekends, or during flagship events. The following steps, while not a complete solution, can help mitigate the situation somewhat:

  1. SLAs: Most cloud-based SaaS SLAs are written with either 4 or 5 9s (such as 99.999). While those contracts won’t stop incidents from happening, they will at least give some financial recourse when such events happen. While it might be preferred to write large SaaS contracts with business outage costs rather than technical outage and data loss costs, most SaaS vendors may not accept such language in contracts. The higher the penalty for such incidents, or higher penalties for long resolution times, the faster it gets attended to. In events like this, where vendors restore a few customers per batch, you want to be the first in line and your contract needs to reflect that.

  2. Have a backup option. Ideally, it might be better if you have a backup solution that is either by a different provider or on a different cloud for such occasions. But that can be expensive. Multi-cloud solutions are easier said than done. If your business is that critical, those options must be considered.

  3. Have a plan for such extended downtime. When such long outages happen, part of the issue is about productivity to your employees, partners, and services. Your business can not be at a standstill because your service provider is down. Whether it is a backup service document-based notes, there has to be a plan in place ahead of time to act.

  4. A lot of Atlassian competitors were using this opportunity to pitch their solution on Twitter, LinkedIn, and other social media on how this would NEVER happen to their product lines. Don’t jump from the frying pan to the fire in the hour of immediate need just because of this incident. However, it is time to consider the other worthy offerings to evaluate if they might fit your business model better.

  5. As discussed in my Incident Management report, “Break Things Regularly” and see how your organization responds. Most digital enterprises make a lot of assumptions about their services and breaking things regularly is a great exercise for validating those assumptions. A couple of options discussed in my report involved either breaking things and seeing how long it will take for support/SRE/resiliency teams to fix it (Chaos monkey theory-based), or creating game-day exercises (from AWS well-architected principles) to make teams react to a controlled exercise to create a “muscle memory” to react fast in such situations. Assumption is a dangerous thing in the digital economy. You are one major incident away from disaster, which can happen anytime.

  6. Measure what matters. If you are just checking “health” of your services and your provider services, your customers will unearth a lot of incidents before your SRE team can. I discuss a lot of instrumentation, observability, and customer real situation monitoring ideas in my Incident Management report.

  7. Review SaaS vendor’s resiliency, backup, failover, restoration, architecture, data protection, and security measures in detail. Not just a claim of x hours of restoration time is good enough. If you have architected a reliable solution on-prem or on another cloud, make sure that the SaaS vendor’s plan and design at the very least match or exceed your capabilities.

  8. When deleting "permanently" make sure it is a staged delete even if it is for compliance reasons. A gestation period of 24 or 48 hours to make sure the deletion didn't do more damage than intended.

  9. Before you execute a script that does mass operations, test many times first to make sure the intended results. Triple-check the mass scripts and delete operations or any major modifications.

  10. Finally, as discussed in my report, take ownership and communicate well. Customers do appreciate that. While such incidents do happen occasionally, how they communicate with the customers, how soon they fix the incident, how detailed is their postmortem, and, most importantly, what they do so such incidents don’t occur in the future is more important.

Bottomline

Atlassian cloud has growing pains. It may be a tough pill to swallow, but they need to go back to the drawing board and reassess the situation. Not only do they need to take a hard look at their cloud architecture, their processes, their operations, and more importantly their mandate to convert all customers to the cloud or Data Center by 2024. It is such a shame as I like their suite of products. A solid line of products that have performed well for large enterprise customers with their On-prem Data Center version for many years until now.

They also need to automate a lot of their cloud operations such as restoring deleted customer sites in one batch rather than painful small batches. They should have fully automated rollbacks for any changes whether it is configuration, functions, features, or code changes. If something didn’t work, they should be able to roll back to a previous version in an automated fashion quickly in a matter of hours – not days or weeks.

This happens when companies grow too soon, too fast. An added complexity in the case of Atlassian is the list of acquisitions they did which they are trying to bring together in one cloud platform.

This too shall pass. How they will respond to this event by putting newer processes, controls, approvals, automation, and more importantly automated rollbacks can tell whether they are trustable going forward. Once the picture is clear, enterprises can decide whether it is worthy of continuing with Atlassian or look at some worthy alternatives.

It is too soon to tell at this point.

PS: I had a call with their head of engineering (Mike Tria) on 4/19/2022, who addressed some of these concerns and explained in detail some of the measures they are doing to fix this issue so it won’t happen again. It included items like staged permanent deletes, operational quality, mass auto rollbacks, customer restoration across product lines, etc. He also discussed at length about what they are doing to the customers who cleverly instantiated instances in parallel while waiting for the issues to be resolved and how they can be merged into their main service.

I was assured by Atlassian that this incident will be reviewed in detail and the measures they are taking going forward will be addressed in detail in a PIR (Post-mortem Incident Report) that is scheduled to be released soon (before the end of the month per Atlassian).

 

 

 

 

Data to Decisions Tech Optimization Innovation & Product-led Growth Future of Work Next-Generation Customer Experience Digital Safety, Privacy & Cybersecurity ML Machine Learning LLMs Agentic AI Generative AI Robotics AI Analytics Automation Quantum Computing Cloud Digital Transformation Disruptive Technology Enterprise IT Enterprise Acceleration Enterprise Software Next Gen Apps IoT Blockchain Leadership VR Chief Information Officer Chief Digital Officer Chief Analytics Officer Chief Data Officer Chief Information Security Officer Chief Technology Officer Chief Executive Officer Chief AI Officer Chief Product Officer

SEC Moves Toward Mandatory Climate Reporting: Are You Prepared?

Draft rules from the SEC will require public firms to report carbon emissions and climate risks. Software vendors are gearing up to help you comply.

Thousands of public companies are already reporting carbon emissions and climate risks as part of their environmental, social and governance initiatives. But what was once voluntary may soon be mandatory. Is your organization ready to follow the leaders?

The mandate is coming from the U.S. Securities and Exchange Commission (SEC), which on March 21 issued draft regulations that will require public companies to report their carbon-emissions and climate-risks. The rules are not final, but the SEC already gathered industry feedback in 2021 before coming up with the draft rules. From here there will be a 60-day public review period. The SEC is expected to issue final rules by the end of 2022, with a three-year phase-in period that would start in 2023.

So are you ready to report, let alone plan and strategize around emissions and climate-risk data? If your company is among the more than 2,600 companies that has embraced the voluntary framework put forth by the Task Force on Climate-Related Financial Disclosure (TCFD), you are in luck. Based on last year’s industry feedback, the SEC has modeled its disclosure requirements in large part on the TCFD disclosure framework, as well as the also-popular Sustainability Accounting Standards Board (SASB) industry standards and materiality guidance.

As noted in the video report posted above, some aspects of the SEC’s draft rules are likely to face legal challenges, particularly Scope 3 (indirect) emissions reporting and guidance on what is material to a company’s financial performance. But Scope 1 and Scope 2 (direct) emissions and climate-risk disclosures are likely to be instituted as soon as next year. You can also expect global requirements, as the IFRS Foundation, which guides financial reporting in more than 140 countries, is expected to introduce its own draft emissions disclosure requirements this year.

Industry leaders, fast followers, and even cautious adopters are already reporting this data. In fact, 92% of Fortune 500 companies already disclose data tied to climate issue and publish environmental reports. The problem is that they do so using a variety of voluntary formats.

The good news in regulation is that it will provide uniform requirements that will simplify reporting for business. The mandate will also ensure complete and consistent measures that will help investors and the public spot greenwashing and selective disclosures.

Technology will be crucial to not just meeting reporting requirements, but to harnessing data for strategic and operational planning and business differentiation. The vendors that have been ahead of the curve with ESG-supporting software – companies including CervestEnviziHoneywell, Insight Software, PlanetlyWatershed, and Workiva – will retain first-mover advantages, as they know how to gather and report data based on the leading voluntary frameworks. They know where customers are likely to struggle when it comes to compiling data and, in some cases, moving the needle on ESG goals. They will be in a good position to guide companies preparing to meet the SEC’s coming reporting requirements.

In some ways, however, new requirements level the playing field for vendors and end-users that haven't spent years working on climate and social reporting. In recent weeks I’ve had briefings and updates with several vendors that are clearly gearing up to support ESG initiatives, including Anaplan, C3.ai, Onestream, Salesforce, and SAP.

The bottom line in the SEC’s draft regulations is that it’s time to prepare – if you haven’t already – for an era in which climate-impact and climate-risk reporting will be mandatory for public companies. From my perspective, it’s best to take a proactive approach and use technology to turn requirements to your own advantage.

Data to Decisions Tech Optimization Innovation & Product-led Growth Future of Work sustainability Chief Data Officer Chief Digital Officer Chief Executive Officer Chief Financial Officer Chief Information Officer Chief People Officer Chief Procurement Officer Chief Supply Chain Officer Chief Sustainability Officer

SEC Moves Toward Mandatory Climate Reporting

Draft rules introduced by the US Securities and Exchange Commission call for mandatory emissions and climate risk reporting. Tech vendors are gearing up to support corporate environmental initiatives.

Data to Decisions Tech Optimization Chief Executive Officer Chief Information Officer Chief Sustainability Officer On <iframe src="https://player.vimeo.com/video/695101455?h=04c6254bf3" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>
<p><a href="https://vimeo.com/695101455">SEC Moves Toward Mandatory Climate Reporting</a> from <a href="https://vimeo.com/constellationresearch">Constellation Research</a> on <a href="https://vimeo.com">Vimeo</a>.</p>

Reclaiming Marketing’s Mojo: Lessons from Adobe Summit 2022

I’m about to say the quiet stuff out loud: “Modern marketing” has meant marketers accepting that sometimes, we couldn’t get “there” from “here.” We have had to sacrifice our mojo in the name of modernization—sacrifice that secret sauce that turns wildly creative moments into monetization.

For those open to hearing it, an interesting thread emerged during this year’s Adobe Summit (summit.adobe.com): Marketing is actively reclaiming its mojo. And make no mistake, integration is at the center of getting our groove back.

In Adam Grant’s main stage session discussing organizational psychology and the dynamics of a team, he explained that when you get a “room of smart people together,” instead of coming together to make the room smarter, individuals spent time proving they were THE SMARTEST of the smart. To quote Grant, “often, the whole was less than the sum of its parts.” He went on to share that failed… anything…wasn’t because of a lack of smart people thinking smart things. Rather, failure was the byproduct of not RE-thinking fast enough.

This resonated with me as a marketer…profoundly. I’ve BEEN IN that smart room watching the smartness wars demolish the best strategies and best intentions. But I’ve also understood that we marketers were forced to rethink everything in the earliest days of the global COVID-19 pandemic. We needed to quickly see the writing on the walls that something massive, disruptive and devastating could be on the way. We started to run…and being the smartest in the room didn’t matter as much as being able to be the most connected and most aligned across the organization.

Smart was nice. But smart didn’t save opportunities, recovery or growth. Smart was a bonus. Integrated was a requirement. Integrations, from teams to technologies helped us survive…and it will be integrations that will help marketing to continue to thrive and disrupt the status quo.

If there is anything that Adobe Summit 2022 showed us is that integration is the path forward for marketing. Be it integration between systems, channels, data and applications or integration across teams, people and partners, the biggest opportunities will realized and optimized thanks to collaboration, connection and integration.

Integration has taken center stage. Don’t believe me? Let’s break down three of the big Summit announcements that immediately caught my eye:

  • New unified workflows between Adobe Workfront, Adobe Creative Cloud Enterprise, and Adobe Experience Manager integrate teams along the pathways of how the work of engagement gets done. This isn’t just integrating the work of approvals or collaboration, but fundamentally integrating along the continuum of storytelling and how that work forms the cornerstone of durable, profitable customer relationships.
  • Data integrations align the speed of the customer with the speed of personalization as Adobe Real-Time CDP and Adobe Target come together. It is one thing to aggregate and normalize data, it is another to fully integrate fundamentally different intentnions behind data sets to craft a more holistic view of the intersection between customer, opportunity and brands. By bringing Adobe Target and Real-Time CDP together, the intentions of growth via precise delivery meets the opportunity of growth via precise personalization and contextual content.
  • Integration took center stage again through key partnership announcements including commerce-centric partnerships with Walmart, FedEx and PayPal to expand a customer’s control of their buying, paying and delivery journey. Yet another partnership integration, with IBM’s The Weather Channel business, integrates a new layer of contextual intelligence bringing weather, climate and environmental contextualization to personalization.

Each announcement focused on the integration to facilitate and accelerate the work being done. Each focused on expanding the ecosystem of work to embrace the customer by ingesting the behavioral and contextual signals most directly impacting how their consumption of engagements could be shifted thanks to a more personal approach.

(Side note: There were a TON of announcements from enhancements and expansions of Adobe Sensei, Adobe’s AI, to the introduction of Adobe Experience Cloud for Healthcare and Adobe’s new cloud-based digital learning platform, Adobe Learning Manager. Check out this Adobe news announcement for a round up: https://news.adobe.com/news/news-details/2022/Adobe-Summit-2022-Make-the-Digital-Economy-Personal/default.aspx)

That isn’t to say there wasn’t plenty of that future-forward aspirational dreaming Adobe Summit is known for. This year, it wasn’t just announcements of cross-cloud integration demonstrating that connection between Creative, Experience and Document clouds…it was hearing from Scott Belsky, the Chief Product Officer and EVP of Creative Cloud sharing Adobe’s vision of the metaverse.

Shared immersive experiences visualized in a 3-D world will be core to the metaverse economy. The vision outlined at Summit revolved around bringing the metaverse to life with Adobe Substance 3D Modeler (now in beta) that aims to bring the capacity to create AND collaborate on 3D assets out of the specialized (and complex) world of design and into the immersive, collaborative, real-time workstyle of the metaverse ready enterprise. Engagement across metaverses will not wait for brands to get up to speed with 3D, let alone the economies, commerce and experiences. Customers ready to co-create and collaborate in these shared worlds will simply move on.

I typically call Adobe Summit the annual celebration of marketing…but I’ve rethought (thanks Adam) that position and instead will leave you with this: Adobe Summit 2022 was a celebration of the work and the integrations that accelerate and empower how experiences translate into revenue. Thankfully, that work is marketing’s work. And what gets done is growth in an economy based on the power of getting personal…and THAT is marketing’s mojo. Time to go reclaim it!

Marketing Transformation Future of Work Next-Generation Customer Experience Chief Executive Officer Chief Marketing Officer Chief Digital Officer

Adobe Summit 2022: CR Event Report Showcase

Two days. Three videos. One showcase to view what got Liz Miller thinking after Adobe Summit 2022.

Marketing Transformation Matrix Commerce Next-Generation Customer Experience Chief Executive Officer Chief Information Officer Chief Marketing Officer Chief Digital Officer Chief Analytics Officer On cx_convos <iframe src="https://vimeo.com/showcase/9402702/embed" width="720" height="405" allowfullscreen frameborder="0"></iframe>

ConstellationTV Episode 30

On Episode 30 of CRTV, Constellation analysts Holger Mueller and Liz Miller interview special guests Maggie Hulce, Executive Vice President of Indeed and Simon Harrison, CMO of Avaya. Maggie shares 2022 talent acquisition trends from the perspective of Indeed, and Simon breaks down the topic of composibility. 

On ConstellationTV <iframe src="https://player.vimeo.com/video/691028517?h=c449053d44&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="960" height="540" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="ConstellationTV Episode 30"></iframe>

Omnichannel. We Meet Again

Are we the problem with omnichannel? Is our need to retain the last shreds of command-and-control engagement holding back a new age of commerce and selling? Liz Miller takes on one of her favorite topics, Omnichannel. 

Be sure to check out Liz's blog on Catching Up with Omnichannel on PROS.com to keep the rethinking and retooling going!

Marketing Transformation Matrix Commerce Next-Generation Customer Experience Chief Customer Officer Chief Marketing Officer Chief Digital Officer Chief Data Officer On cx_convos <iframe src="https://player.vimeo.com/video/679350654?h=fe972b82b6&amp;title=0&amp;byline=0&amp;portrait=0&amp;speed=0&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="1920" height="1080" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="Rethinking Omnichannel Commerce"></iframe>

CREventReport: Adobe Summit 2022 Day 2

And just like that, Day 2 of Adobe Summit 2022 is in the books. Liz Miller shares her hot take on the ever-popular SNEAKS! and officially requests her title be changed thanks to a Day 2 speaker. To check out Day 2 of Summit for yourself, visit summit.adobe.com

Marketing Transformation Matrix Commerce Next-Generation Customer Experience Chief Marketing Officer Chief Digital Officer On cx_convos <iframe src="https://player.vimeo.com/video/689394303?h=4b8d58aabc&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="1280" height="720" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="CREventReport: Adobe Summit Day 2"></iframe>