Chad Robinson – BMC Software | Blogs

How to Hire for the Multi-Cloud and Avoid a Skill Shortage

Chad Robinson — Thu, 16 Aug 2018 00:00:46 +0000

Hiring for any position is always stressful because so much is at stake, and technical roles are particularly challenging given high applicant numbers for many titles, often-optimistic proficiency lists on resumes, and the need for careful testing for required skills. The end result is a process that can be time consuming, expensive, and even risky: the methodologies for performing this testing are hard to standardize, so a great deal of subjectivity is often introduced into what is meant to be an objective, merit-based process.

At first glance, the term “multi-cloud” doesn’t seem to add much here: it is simply the act of using more than one cloud vendor. However, the reasons for doing so are usually tied to business needs that require more sophistication than just managing a few servers and load balancers in one cloud. To be successful, employees entering these teams usually require specialized knowledge and experience.

There are typically several roles found within multi-cloud teams: cloud architects and engineers, DevOps, site reliability engineers (SREs), and managers. Since no one individual can credibly wear all these hats, no one job candidate will have all of these skills. Thus, building multi-cloud teams may end up feeling like fitting pieces together to solve a puzzle. It helps to start by mapping out what the pieces are, then breaking them down into specific skill areas:

Knowledge

A candidate’s skill set is an obvious place to start an evaluation, but in multi-cloud environments, skills usually considered “nice to haves” may become “must haves.” For example, many technical interviews focus on programming languages or vendor offerings and touch only briefly on configuration management or monitoring tools. The best multi-cloud teams make these skill areas high priorities.

Cloud and DevOps processes and solutions

Moving physical servers to the cloud is a “baby step:” the simplest form of cloud architecture on the road to multi-cloud. Multi-cloud environments are an order of magnitude more complex, so it is imperative that teams include skills coverage in areas such as tools and platforms (e.g. Terraform, Chef, or Kubernetes).

These skills are not always easily transferable between products, and while it is common to see “buzzword bingo” in resume skills lists, such “Ansible, Chef, Puppet” in a single bullet, mastering these tools takes time and they work in very different ways. It is rare to find a candidate above the 7/10 mastery level in more than one or two, so interviewers should ensure they thoroughly evaluate for competency in the desired tools.

Likewise, companies seeking candidates with a specific area of mastery should avoid casting too broad a net. A common scorn among DevOps role seekers is seeing “Chef, Puppet, Ansible” in a requirements list. Experienced candidates will assume this red flag indicates either the company does not know what it wants, or has a mess on its hands.

Vendor Offerings

At the time of this writing, Amazon alone offered 110 distinct, branded services, and the number nearly triples as the additional top 3-4 vendors are added to the mix. Certifications may provide some hint of a candidate’s qualifications, but these programs rarely cover more than a small subset of a vendor’s offerings, and many qualified individuals never bother with them at all.

It helps to avoid over-specialization in the team’s skills mix. Vendor offerings evolve quickly, so the ability to quickly master new technologies is generally more valuable than knowledge of a specific product. Also, a candidate with deep experience in just one vendor’s offerings may be tempted to choose the familiar product even if a competitor has recently introduced a better one. This does not mean product-specific knowledge is not important, simply that breadth often beats depth in multi-cloud teams.

Tooling

As mentioned above, experience with “tooling” may be so important in multi-cloud environments as to outweigh language or platform skills. When architectures span multiple cloud vendors, these tools may simply be the only feasible way to build, manage, and monitor those environments at scale. It is imperative that a candidate’s background includes some experience in this area.

In many environments, low- (or zero-) cost tools such as Terraform may be sufficient, and because they are so common, many candidates may have used these products in the past. However, organizations with more sophisticated needs may choose to leverage enterprise-class solutions such as BMC Helix Discovery and/or TrueSight Orchestration. Experience in smaller environments may not easily translate to enterprise-class infrastructures, and interviewers should evaluate for this aptitude during the hiring process.

Experience

Beyond the prerequisite technical skills, to succeed in multi-cloud environments candidates will need at least a good working knowledge of two or more vendors’ service offerings. All of the top cloud service vendors offer similar compute, storage, and networking resources. But in areas such as “Big Data,” serverless computing, IoT, and machine learning, the offerings diverge rapidly. A candidate with experience working with the same vendors the organization uses will have an advantage here.

Experience in the same industry and with companies of the same size as the hiring organization is also important. For example, nearly all large organizations must meet regulatory requirements, some (e.g. financial services or healthcare) to a great degree. It is very common for cloud architects and DevOps engineers to assist in meeting these requirements, and they may be called upon to determine (sometimes with “must not fail” accuracy) factors such as:

Where data is stored,
Where the customer or end-user resides,
Where the customer or end-user is located at the time of use,
How data is secured and securely transferred between sites and/or applications, and
Who has access to manage these components, and how this is done.

Hands-on experience in actual multi-cloud environments may be best evaluated by asking practical questions, such as “If Vendor B offers a given service at a lower price than Vendor A, will it actually be cheaper once cross-site bandwidth costs, management costs, and reliability factors are calculated in?” These types of questions are common, even daily, challenges for multi-cloud engineers. Even without a pricing table at hand, a candidate should be able to articulate a process for estimating this answer (which should typically start by asking more questions back about the hypothetical application and how it behaves.)

Vision

It may be said that multi-cloud architecture is a process, not a destination. Within the act of choosing to leverage more than one cloud vendor is the act of justifying this decision in some way. It certainly adds complexity and therefore risk, so there must be some return to make it worthwhile.

This area is a common struggle for technologists. Adopting a new cloud service cannot be done solely on the basis of a new feature or better price. The offering must fit the organization’s multi-cloud strategy as well. Equally important, these teams may be called upon to communicate and evangelize their efforts more than others. A candidate with a vision for why multi-cloud is an important concept and how best to take advantage of it, combined with the communication skills to advocate for that strategy, may be a better fit than one with product-specific knowledge but no passion for “selling” others on the team on its use.

Execution

Once the other pieces of the puzzle are in place a multi-cloud team must still deliver on the overall vision, and this goes far beyond “the application runs.” For instance, to determine if cost savings are being achieved, costs themselves must be measured. But, while all cloud vendors provide some type of cost reporting and analysis tool, no vendor today integrates their own data with that of competitors. Either a third party tool must be used or the team must develop its own.

Candidates with experience in one or more of these execution-related areas would bring additional value to a multi-cloud team:

Cross-vendor cost tracking, reporting, and monitoring,
Application build, test, and delivery processes in multi-cloud environments,
Cross-cloud backup, redundancy, and failover handling mechanisms,
Application and data security, particularly how to manage secure communications between multiple “walled gardens,” and
Inventory management and reporting, a.k.a controlling cloud service “sprawl.”

Consider making a playbook for how the team will achieve its multi-cloud goals, then aligning searches for new hires to expand the team’s skills in areas that help execute against that playbook. As a corollary: if there is no playbook today, the next hire should have the skills required to make one!

Finally, finding the best candidates for any role is just the first step in the process. Employees must be motivated, empowered, and pointed in the right direction to achieve the organization’s goals. Personal growth is an excellent motivator in nearly any role. Consider including in the evaluation those individuals with the required skills but mismatched titles, such as “system administrator” or “software architect”. Remember, “cloud” itself is a buzzword invented by the industry to sell services in a new way. Skills and competence should be emphasized over past titles in this case, and the opportunity to grow into this new specialty may be a strong motivator for retaining a new hire longer than the industry average.

Anti-Patterns vs Patterns: What is an Anti-Pattern?

Chad Robinson — Wed, 25 Jul 2018 00:00:54 +0000

Jargon permeates the software development industry. Best practices. Artifacts. Scope Creep. Many of these terms are so common as to be called overused, and it is easy to assume we understand them because they seem so obvious. Still, we sometimes find new depth when we examine them closely. In this post, let us muse on the “Pattern,” and its somewhat lesser known counterpart, the “Anti-Pattern.”

Patterns

We all know what patterns generally are in common language, but to understand their importance in software engineering it’s important to first discuss algorithms. An algorithm is simply a way of performing a common task, such as sorting a list of items, storing data for efficient retrieval, or counting occurrences of an item within a data set.

Algorithms are one of the oldest, most fundamental concepts in software engineering. Indeed, on this author’s desk sits a copy of what is considered by many to be one of the most seminal works on the subject, “Fundamental Algorithms” by Donald Knuth. The First Edition of this small tome of just over 600 pages was first copyrighted in 1968, 50 years ago.

The text would be nearly unrecognizable to a modern programmer, as it mainly emphasizes Calculus-based proofs of its solutions and its only code examples are provided in obscure, outdated languages such as Algol or MIX Assembly. Despite this, much of what was covered is still used today: singly- and double-linked lists, trees, garbage collection, etc. The details are often buried in convenient libraries, but the concepts are the same. These algorithms have remained valid solutions to common software engineering problems for more than 5 decades and are still going strong.

A “pattern” can be considered a more general form of an algorithm. Where an algorithm might focus on a specific programming task, a pattern might consider challenges beyond that realm and into areas such as reducing defect rates, increasing maintainability of code, or allowing large teams to work more effectively together. Some common patterns include:

Factories – An evolution of early object-oriented programming concepts that eliminated the need for the creator of an object to know everything about it ahead of time. A flowchart application might support extensible stencil libraries by focusing on creating and organizing “shapes,” allowing the stencils themselves to manage the details of creating a simple square vs. a complex network router icon.
Pub/Sub – A mechanism for “decoupling” applications. Rather than having a sender directly send messages to a receiver, the sender “publishes” the messages to a topic or queue. One or more receivers can “subscribe” to receive those messages, and the message queue handles details such as transmission errors or resending messages. This simplifies both the sending and receiving applications.
Public-key Cryptography – A mechanism by which two parties can communicate securely and without interception, yet without the need to pre-arrange an exchange of secret encryption keys. Each party maintains a pair of keys (public and private), and the public key can often be obtained as needed rather than exchanged in advance.
Agile – A philosophy that encapsulates a set of guiding principles for software development that emphasize customer satisfaction, embrace the need for flexibility and collaboration, and promote the adoption of simple, sustainable development practices.

These are just four of the many common patterns in the industry, and even in this mix we can see how they range from highly technical to broader, more process-oriented points. Factories are a very code-oriented pattern, while pub/sub is more architectural in nature. And while public-key cryptography has broad implications, libraries to support its operations are available for nearly every programming language in common use today, making it generally straightforward to implement.

At the other end of the spectrum, “Agile” remains somewhat elusive: simultaneously a rallying point and an instrument of divisiveness among developers, project managers, and other stakeholders about exactly what it means and how it should be implemented. It is a great example of an overused yet poorly understood term. Seeing the terms “Waterfall” or “Stand ups” in the same sentence as “Agile” is almost always an example of misuse. Agile is a philosophy, not a software development methodology, so it cannot be directly compared to Waterfall, nor does it directly spell out process components such as stand ups. (Those are a component of Scrum, a methodology that implements Agile principles, but does not represent Agile itself.)

Narrow or broad, technical or process-oriented, a good working knowledge of these patterns is an essential component in a technologist’s toolbox.

What is an Anti-Pattern?

If a “pattern” is simply a known-to-work solution to a common software engineering problem, wouldn’t an “anti-pattern” simply be the opposite? A non-Agile development methodology, or a tightly-coupled application?

Actually, anti-patterns do not just incorporate the concept of failure to do the right thing, they also include a set of choices that seem right at face value, but lead to trouble in the long run. Wikipedia defines the term “Anti-pattern” as follows:

“An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive.”

Note the reference to “a common response.” Anti-patterns are not occasional mistakes, they are common ones, and are nearly always followed with good intentions. As with regular patterns, anti-patterns can be broad or very specific, and when in the realms of programming languages and frameworks, there may be literally hundreds to consider. Here are just a few of this author’s high-level, personal favorites:

Whiteboard programming challenges in software interviews

David Hansson, creator of Ruby on Rails and the Founder and CTO of Basecamp, once tweeted “Hello, my name is David. I would fail to write bubble sort on a whiteboard. I look code up on the internet all the time. I don’t do riddles.” The anti-pattern here is evaluating the wrong metrics during an interview, such as where a typical task assignment will be “Add zip code lookup during registration” but interview questions sound like “Sort this array in pseudocode using functional programming concepts.”

Remember the “good intentions” aspect of anti-patterns? It seems as if we are testing the candidate on a valuable principle: knowledge of fundamentals. However, programming is often a ruthlessly pragmatic practice, and this focus on theoretical knowledge over practical skills and experience might cause us to choose a candidate that meets our cultural ideals, but lacks the actual skills required to be successful in the position.

Put another way: if StackOverflow will be a regular resource used by the developer in the position, it should be available (and used) during the interview. Homework assignments and pair programming challenges may also be worth exploring.

All patterns and anti-patterns have valid exceptions. A developer whose job will be to make libraries of algorithms for others to use may very well need to know the Calculus behind a mechanism. The error here is applying this expectation universally, even to developers who will not be doing so.

Moral Hazard

In philosophical contexts, Moral Hazard is the separation of individuals from the consequences of their decisions. This sounds like an obvious behavior to avoid, but this anti-pattern is the root cause of many SDLC inefficiencies.

Consider the traditional QA process, in which “tickets” are addressed by developers, then passed to QA for review before being deployed. There are two problems here. First, staffing ratios are almost never “1 developer to 1 QA analyst,” and even a handful of developers can easily exceed the capacity of the QA team. Second, this insulates developers from the consequences of their mistakes by making it another individual’s responsibility to find them before they are released – a moral hazard.

The effects of this anti-pattern can be subtle: if the QA team is effective, it may not directly lead to lower quality output. It is more likely to show up in other areas such as complaints about estimation accuracy and missed targets. Quality and estimation accuracy suffer because developers instinctively focus on “getting things through QA” rather than shipping high quality software. Even with a modest defect rate of 20-30% (a number which even might be optimistic in many organizations), the churn this produces can significantly impact team productivity.

Additional anti-patterns often arise in the attempt to solve the problem. In Scrum, it may be tempting to make sprints longer or hold them open. But a sprint is meant to be a measure of time, not a measure of output. This act reverses that nature, which destroys the value of other tools such as “velocity” metrics that are based upon it. It is also common to see longer sprint planning or pre-planning meetings to more deeply review tickets. But this attempts to convert an instinctive process into a scientific one, forgetting that the purpose for implementing a methodology like Scrum was to acknowledge this impossibility in the first place.

Two patterns that are often effective at resolving this issue include:

Embracing a culture of continuous improvement: “ship it when it’s better, not when it’s right.” (Also see “Polishing the Cannonball” below). Developers encouraged and empowered to do this can make better decisions about how they address their tasks, and also experience a more tangible sense of personal accomplishment.
Make developers responsible for their work product all the way through to Production deployments. Facebook, Google, and other industry titans have all reported success with this approach.

Polishing the Cannonball

Sometimes also known as “gold plating” or “boiling the ocean,” trying to ship perfect products often significantly increases project timelines and costs without actually increasing the value delivered. A closely related anti-pattern is the “zombie ticket,” the plaque on the arterial walls of the Backlog. Zombie tickets are never a high enough priority to get cleaned out, but are never closed for fear of losing the documentary record of the task.

The problem with both habits is that the metrics that support them are phantoms. Unshipped features have zero value to customers, and tasks that do not cause enough pain to become priorities may never be worth addressing. It is almost always better to focus available resources on regularly delivering new, valuable features rather than on constantly looking backward on small issues that affect very few users.

The “pattern” counterpart here is the minimally viable product (MVP), which often ends up being a bit of a phantom itself. (MVPs are almost never as small as planned or hoped for.) However, the act of attempting to ship an MVP is itself often an antidote to the problems listed above, so even if some slippage does occur it is still worth the effort. Iterative development processes also address this by emphasizing regular, predictable delivery of incremental value, reinforced by feedback from actual end users.