Focusing on the basics of architecting the core components for your cloud foundation
In my previous blog post in this series on the journey to cloud, I examined how to lay the organisational foundations for cloud by implementing the right operating model. In this latest instalment, I move on to the key step of architecting the cloud platform.
It is imperative that you get the cloud platform right, as it becomes the technology backbone of your future organisation when you migrate to cloud.
However, this is made more challenging by the wide range of factors that your platform architecture will depend on. These include your cloud strategy, choice of cloud provider(s), security policies, investment capacity, existing investments and – of course – your ambitions and approach to leveraging the power of cloud.
Reflections on establishing your cloud platform programme
Our experience shows that companies migrating to cloud tend to realise the best outcomes by mobilising an up-front platform enablement stream of work. This effort should focus on enabling the core elements of the platform, including ensuring all standards and policies are clearly defined and that appropriate security and regulatory governance is established. We often find that selecting a set of “trailblazer” applications – each with its own resilience and security requirements – and accelerating them onto your cloud platform is the best way to prove the effectiveness of the operational, regulatory and security procedures that have been put in place.
Of course, this often means up-front investment, but in our experience the benefits far outweigh the costs. Not only does this approach accelerate the release of value from cloud – it also helps avoid the remediation effort for technical debt that is likely to arise from a rushed, poorly planned and/or poorly architected cloud platform.
Cross-functional teams are also critical here. Mobilising a team comprised of engineers, architects, security specialists and operations resources is essential for ensuring all the relevant technology disciplines feed into the platform design and enablement process.
Pulling together and managing this diverse cross-functional team is often a challenge – so why not consider running platform parties to generate a bit of excitement and team spirit? Not only is this an exceptionally effective way to fast-track design decisions and resolve inter-team disagreements; we often find it also leads to lasting collaboration and improved relationships between technology teams. But don’t forget the pizza and beer (…or suitable soft drinks)!
So, what should be included in your cloud platform’s foundations?
As you start to think about your cloud platform, it is important to maintain a laser focus on architecting your core cloud foundations correctly. As we highlighted above, this will help to avoid the potential need for costly remediation of technical debt in the future.
There are five broad layers to your core platform – and each layer has several important architectural considerations that your enterprise must get right.
- Landing zone – A multi-account structure and design, which complies with best practices and standards, will pave the way for building the right access controls, network topology, logging and monitoring capabilities, and financial management capabilities.
- Compute, storage and container services – Clearly defined compute and storage requirements based on your enterprise’s architectural strategy and upcoming workload requirements. It’s important to define your principles for utilisation of the different forms of compute (such as virtual machines, containers or serverless) and storage (such as object storage, block storage or shared storage) to ensure you have a common approach across the enterprise.
- Shared services – A clearly defined tooling strategy for DevOps, SecOps, RunOps and FinOps, together with a clear understanding of how these tools will interface with your core cloud platform and services. This will ensure all operational capabilities are established up front.
- Data platform and governance – Selection of the different database types needed to cater to various data domains and structures within your enterprise (relational, NoSQL, graph and so on), and the appropriate security controls to meet the data classification, jurisdictional and regulatory requirements.
- Security services – Definition of security policies for network segregation, access control, user management and key management, and ensuring the right tooling is in place to protect your network boundaries. As you adopt each new cloud-native service, you must also ensure the appropriate guardrails are defined and implemented to secure the service end-to-end.
Addressing compliance considerations…
The first step towards achieving cloud compliance is to be aware of the standards and regulations that apply within your industry as well as those standards established by your own enterprise. A thorough understanding of all of these will provide an invaluable set of architectural parameters for your platform design.
It is important your enterprise adopts appropriate compliance monitoring tooling to ensure teams within your enterprise have real-time visibility over any activity taking place on the platform. These tools can be configured in such a way as to alert your teams in the event that any suspicious activity takes place across the platform.
As you design and enable services on your cloud platform, it is also vital to ensure that they are underpinned by a robust set of security policies that are codified into “infrastructure as code” (IaC) and automated. Automation of compliance policies is a critical step towards preventative compliance – and is a particularly powerful tool in enabling continuous compliance across multi-cloud technology estates.
From a data compliance perspective, we recommend that you define your service control policies to meet all of your enterprise’s data classification, jurisdictional and regulatory requirements.
…while architecting for security…
Under the “shared responsibility” model for cloud services, responsibility for security is shared jointly between the cloud provider and the customer. The key aspect in designing and implementing security in this type of shared model is to adopt the “defence in depth” strategy. This relates to defining the technologies that can be implemented within different layers of your network, such as the perimeter control layer, a DMZ layer, a third-party access layer, a user access layer and a private subnet layer.
Implementation of the chosen technologies should be carried out in conjunction with the enforcement of security policies within each layer. A further important consideration is the need to architect accounts and private networks for logical segmentation to reduce the “blast radius” of an incident and ensure the impact is limited to segregated infrastructure.
…and engineering for resilience
Never forget that computers are machines and machines go wrong. It’s imperative that you establish enterprise-level resilience standards based on risk assessments across the enterprise – and that as you build cloud workloads you conduct workload-level risk assessments to ensure the appropriate standards are applied. With this in mind, there are several layers involved in achieving resilience, which you can choose to leverage depending on your enterprise’s appetite and the requirements of the workloads. These layers are:
-
- Multi-region – Architectures that span multiple geographic areas which are isolated from each other and can provide protection in the case of a catastrophic event. This approach should be reserved for your most critical workloads, given the related architectural complexity, engineering effort, data replication considerations, and need to be able to read/write data over specific regions.
- Multi-availability zones (AZs) – Architectures that use different isolated zones within the same region. Deploying virtual machines in an application architecture with multiple AZs will reduce the likelihood of an individual AZ acting as a single point of failure.
- Placement groups – Grouping the workloads based on their criticality definitions (RTO and RPO), and then deploying them as cohesive placement groups, can also help to improve resilience.
To reduce the risk of vendor lock-in and enhance resilience still further, your enterprise could consider multi-cloud-enabled deployments. However, be mindful of the architectural complexity, the engineering effort and the operating model considerations that will be involved in supporting the deployment of multiple cloud platforms.
What’s next?
In my next post in this series, I’ll drill down into the role and importance of the cloud Centre of Excellence (CoE) – and describe how to reap the greatest benefits from it. Watch this space!
I would like to thank my colleagues Sam Gunn, Ketan Garde and Adam Scaffardi who contributed to this blog.