Trust but verify – A new way to think about Cloud Management
Cloud management platforms (CMPs) are very popular to manage cloud servers and applications and have been widely adopted by small and large enterprises. For datacenter management (DC) spanning over decades before, there has been a sprawl of systems management tools to manage datacenters. The common wisdom in both these models is to control access to the cloud at the gates by CMPs or DC tools just like in the historic days forts were protected and access controlled with moats and gates. However, with the increasing focus on agility and delivering faster business value to customers, developers and application release teams require a much greater flexibility in working with cloud than previously imagined. Developers want full control and flexibility on tools and APIs to interact with cloud instead of being stopped at the gates and prescribed a uniform single gate to use cloud. Application owners want to allow this freedom but still want cloud workload to be managed, compliant, secure and optimized. This freedom and business driver for agility is creating a new way to reimagine cloud 2.0 which does not stop you at the gates but allows you to come in while continuously checking policies to ensure that you behave well in cloud. Ability to create and apply policies will play a key role in this new emerging model of governance where freedom is tied to responsibility. We believe that the next generation cloud operational plane will drive the future vision on how workloads will be deployed, operated, changed, secured, and monitored in clouds. Enterprises should embrace policies at all stages of software development lifecyle and operations for datacenters in cloud and on-prem. Creating, defining and evaluating policies and taking corrective actions based on policies will be a strategic enabler for all enterprises in the new cloud 2.0 world.
Defining Cloud Operational Plane
In this new cloud management world, you are not stopped at gates but checked continuously. Trust but verify is the new principle of governance in Cloud 2.0. Now, let us review the 5 key areas for a cloud operational plane and how policies will play a critical role in governance.
- Provisioning and deployment of cloud workload
- Are my developers or app teams provisioning right instance types?
- Is each app team using within their allocated quota of cloud resources?
- Is the workload or configuration change being deployed secure and compliant?
- How many pushes are going on per hour, daily and weekly?
- Are any failing and why?
- Configuration changes
- Is this change approved?
- Is it secure and compliant?
- Tell me all the changes happening in my cloud?
- Can I audit these changes to know who did what when?
- How can I apply changes to my cloud configurations, resources, upgrade to new machine images etc.?
- Security and compliance
- Continuously verify that my cloud is security and compliant
- Alert me on security or compliance events instantly or daily/weekly
- Remediate these automatically or with my approval
- Are my resources most optimally being used? Does it have the right capacity? Do I have the scaling where I need it?
- Showback of my resources
- Tell me where am I wasting resources?
- Tell me how I can cut down costs and waste?
- Monitoring, state and health
- Is my cloud workload healthy?
- Tell me what are key monitoring events? Unhealthy events?
- Remediate these automatically or with my approval
How Cloud Operational Plane can be enabled through Policies?
The following table compares the new and old world cloud management. In the old world of cloud management platforms (CMP), we block without trust. In the new world of cloud operational plane, since gates are open, it becomes necessary to manage the cloud through policies as the central tenet for cloud operations. This is the cloud operational plane (COP).
|CMP – Block without Trust||COP –Trust but Verify||Recommended Practices|
|Deployment to multi-cloud||Single API across all clouds, forced to use this.
Catalog driven provisioning
|Various tools + No single point of control
No single API
No single Tool
Use best API/tool for each cloud
|DevOps – your choice
|Manage/start/stop your resources||Single tool||Various tools + No single point of control||DevOp/Cloud tool – your choice|
|DevOps continuous deployment||Hard to integrate, API of CMP is a hindrance to adoption||Embraces this flexibility, allow changes through any toolset||Policies for DevOps process for compliance|
|Unapproved config changes||Block if not approved||Allow usually or block if more control desired||Change Policies|
|Config changes API||Single API||No single API||DevOps tool|
|Audit config changes||Yes||Yes||Audit – capture all changes|
|Rollback changes||No||Yes, advanced tools for Blue-Green, Canary etc.||DevOps tool|
|Change monitoring||No||Yes||Change Monitoring|
|Change security||No||Yes||Policy for change compliance/security|
|Security & Compliance|
|Security in DevOps process||N/A||Yes||Policy for DevOps security|
|Monitor, scan for issues, get notified||N/A||Continuously monitor for compliance & security||Multi-tool integrations|
|Prioritize issues||N/A||Yes, multiple manual and automated prioritization||Policy based prioritization|
|Security and Compliance of middleware and databases
|N/A||Yes||Compliance and security policies for middleware and databases|
|Quota & decommissioning||Block deployment if out of quota
Decommission on lease expiry
|Allow but notify or remove later with resource usage policies.
Decommission on lease expiry
|Policies for quota and decommissiong|
|Optimization||N/A||Yes||Policies for optimization and control|
Policies in Enterprises
As enterprises move into a world of freedom and agility with Cloud and DevOps, it becomes increasingly important to use policies to manage cloud operations. An illustrative diagram below shows how policies can be used to manage everything from DevOps process, on-prem and cloud environments, production environments, cloud infrastructure, applications, servers, middleware and databases.
For agile DevOps, policy checks can be embedded early or as needed in the process to catch compliance, security, cost or shift-left violations in source code and libraries. For example, consider a DevOps process starting with a continous integration (CI) tool such as Jenkins®. Developers and release managers can trigger the OWASP (Open Web Application and Security Project) checks to run a scan against source code libraries and block the pipeline if any insecure libraries are found.
Production environments have applications consisting of servers, middleware, databases and networks hosted in clouds such as AWS and Azure. All these need to be governed by policies as shown above. For example, RHEL servers in cloud are governed by 4 policies – cost control, patch policy, compliance policy and a vulnerability remediation policy. Similarly there are security, compliance, scale and cost policies for other cloud resources such as databases and middleware. Finally, the production environment itself is governed by change, access control and DR policies.
All these policies in the modern cloud 2.0 will be encoded as code. A sample policy as code can be written in a language such as JSON or YAML:
- If s3 bucket is open to public, then it is non-compliant.
- If a firewall security group is open to public, then it is non-compliant.
- If environment is DEV and instance type is m4.xlarge, then environment is non-compliant
Using policy-as-code will ensure that these policies are created, evaluated, managed and updated in a central place and all through APIs. Additionally, enterprises will choose to remediate resources and processes on violation of certain policies to ensure that cost, security, compliance and changes are governed.
Cloud management is changing from a “block on entry” to “trust but verify” model. Some enterprises who wish to govern with an absolute control at gates will continue to use cloud management platforms extensively and effectively. However, many enterprises are beginning to move to a new cloud 2.0 model where agility and flexibility of DevOps tools and processes are critical for their success. Instead of prescribing a single entry choke point or a single “CMP tool” to work with cloud, we allow everybody in with their own tools and processes, but continuously verify that policies for deployment, resource usage, quota, cost, security, compliance and changes are continuously tracked, monitored and corrected. Simple effective API based policy as code definition, management, evaluation and remediation will be a central capability that enterprises will need to run new clouds effectively.
Full disclosure: I work for BMC Software and my team has built a cloud native policy SaaS service, check out the 2 minute video here: https://www.youtube.com/watch?v=hSFP5-kzbT0
Acknowledgement: A few of my colleagues at work, JT and Daniel proposed the analogy of forts and cloud operational plane that is fascinating and cool. This motivated me to write this blog to show how cloud management itself is evolving from guard at gates to trust but verify model.