Three Cloud Data Security best practices against Ransomware

Ransomware is becoming a global menace – last week’s Wannacry ransomware attack as well as ransomware attacks earlier this year on MongoDB and Elasticsearch clusters have become common headlines in recent times.  Hundreds of thousands of servers, and databases were hacked in 2017 as a result of ransomware.   Clearly, as indicated by this tweet, the immediate response to Wannacry ransomware is to patch the Windows servers to remediate a vulnerability that Microsoft patched two months ago.  Are there better ways to address data security more proactively?  Why were thousands of servers un-patched when a patch was released by Microsoft 2 months ago?  We describe data security process and three key best practices for protecting data against ransomware.

Wannacry

Data Security Process

Data security process has four key stages as shown below: discover and classify, prevent, detect, and respond.

data-security-process

The first step is to discover data in the enterprise such as databases and datastores and classify all your data into different levels such as sensitive or personal information data that requires strong security.

The second step is to apply prevention techniques to secure your data such as using proactive policies-as-code to reduce your security attack blast radius, building layers of protection with defense-in-depth, continuously taking backups and continuously auditing the security of datastore as well as security of all compute and network resources along potential attack paths with access to these datastores.   Having a policy management process and management tool for automating policy checks for data security, compliance and backups would ensure that continuous automated audits happen with auto remediations.  Another prevention technique is having a vulnerability and patch management process and remediation time SLAs for critical patches.

Finally, you need to continually monitor and detect data breaches and data security issues and respond to them as reactive steps to mitigate such issues.  Having a vulnerability management process and server management tools would enable quick identification of vulnerable servers that indicates risks to the business and  patching to remediate them.

Let us next break down these 3 practices for securing your data.

I. Prevention – Policies to continuously audit

Ransomware and other data loss can be prevented with many defensive techniques by proactively checking configurations in systems, network, servers and databases.   Four key policies need to be defined and enforced through a policy management process and tool.  These policies form the defense-in-depth layers that will protect the data from hackers or ransomware attacks.

data-security-in-cloud

  1. Compliance policy – There are a number of DB CIS configuration checks that must be followed to ensure that all database configurations are secured.  Many default database settings such as MongoDB or Elasticsearch leave them wide open to internet.  These security and compliance checks need to be continuously evaluated and corrected as one of the first measure of prevention.
  2. Data security policy – Data can be secured by ensuring that we do encryption-at-rest as well as customer provided encryption keys as standard practice for all databases.  Default credentials should be changed as well as appropriate backups must be done and verified.  All these checks can be easily defined as policies and continuously evaluated for any misconfigurations.
  3. Network security policy – Databases, cloud servers and networks along access paths to databases must be secured through use of appropriate firewall or security group routing rules as well as having isolated VPCs and subnets hosting the databases.  Whitelisting of all database accessible servers must be done as well to limit network access.  These policies can be evaluated and enforced through enterprise policies continuously.
  4. Server compliance policy – Databases run on servers that need to have hardened OS images and a vulnerability and patch management process.  The next two best practices describe the vulnerability and patch management process and tools.  All servers within the enterprise should also follow these practices as lateral movement can make database servers insecure if one of the other servers in the cloud becomes insecure.

II. Prevention – Patch SLA monitoring and continuous patching

Enterprises should define a vulnerability and patch management process with objectives on “time to remediate critical security issues”, RTO – Remediation Time Objective (not to be confused with backup RTO).  Enterprises should have RTO SLA policies in place that specify “Number of days all critical vulnerabilities such as with CVSS severity scores of 9 or 10 will be remediated”.  An RTO SLA of 15 to 30 days is quite common for patching critical security vulnerabilities.  This SLA needs to be continuously monitored and any violations need to be notified and corrected.  In the latest WannaCry attack, thousands of computers remain un-patched for more than 60 days even after a patch for this vulnerability was released two months ago.  This could have been avoided with continuous SLA monitoring and remediation of RTO.

Many enterprises could go one step further by not just monitoring and actively managing RTO SLA but also automating the detection and patching.  As soon as critical vulnerabilities are identified through periodic scans and patches are available from the vendors, the management tools must be able to automatically update their patch catalogs and patch servers and network devices in a zero-touch approach.

  1. Continuously scan environment for detecting vulnerabilities
  2. Select critical vulnerabilities for automated patching
  3. Continuously look for patches from vendors such as Microsoft, download critical patches for vulnerabilities and keep patch catalog contents updated automatically
  4. Automatically apply patches for critical vulnerabilities when they are discovered based on policies for RTO.

With these SLA monitoring and patching controls in place, enterprises can achieve a high degree of data security through proactive prevention.

III. Response – Vulnerability and Patching

Even after preventive controls discussed in I and II, there is still a need to detect and respond as all security attacks cannot be always prevented.   In the detect and response situation, once ransomware or other data exfiltration and security threats have been identified, it is important to have the ability to identify the vulnerable servers and patch them as soon as possible.  A reactive vulnerability and patch management system must have the ability to select specific CVE,  assess the servers that require patching and with a few clicks be able to apply patches and configuration changes to remediate that critical CVE.

Recommendations

Data security starts with an enterprise data security process consisting of data discovery, prioritization, prevention, detection and response stages.  The first best practice for data security is prevention where the datastores such as MongoDB, Elasticsearch as well as servers and networks are continuously audited for security and compliance through policies.  A policy management tool will be a critical enabler for achieving this audit and preventive checks.  These tools can be thought of as ways to “detect” and “harden” all places and paths along which data is stored, moved and accessed thereby achieving defence-in-depth.  The second best practice for data security is to define a “Remediation Time Objective”-SLA and implementing a vulnerability lifecycle management process.  A vulnerability management tool continuously scans for vulnerabilities, gives visibility into critical vulnerabilities with SLA violations, and automatically keeps environment patched with zero touch.  Many enterprises that followed a 30 day RTO SLA were not impacted by Wannacry ransomware as they had patched systems in March soon after patch was released. The third best practice is to have an ability to assess a vulnerability and remediate it during emergencies or as a part of security incident response such as the weekend Wanncry ransomware threat.  All the above three proactive and reactive practices and tools can keep the data secure and avoid costly and reputation damaging Ransomware attacks on enterprises.

BMC Software has three products: Bladelogic server automation, SecOps Response and Policy cloud services that can keep your applications, servers, networks and data safe from ransomware attacks.  Wannacry ransomware was a non-event for these customers as they were proactively implementing vulnerability and patch management processes through our tools.

Full disclosure:  I work at BMC Software.  Check out http://www.bmc.com/it-solutions/bladelogic-server-automation.html,  http://www.bmc.com/it-solutions/secops-response-service.html and https://www.youtube.com/watch?v=hSFP5-kzbT0.

Policies Rule Cloud and Datacenter Operations – Cloud 2.0

Trust but verify – A new way to think about Cloud Management

Cloud management platforms (CMPs) are very popular to manage cloud servers and applications and have been widely adopted by small and large enterprises. For datacenter management (DC) spanning over decades before, there has been a sprawl of systems management tools to manage datacenters. The common wisdom in both these models is to control access to the cloud at the gates by CMPs or DC tools just like in the historic days forts were protected and access controlled with moats and gates. However, with the increasing focus on agility and delivering faster business value to customers, developers and application release teams require a much greater flexibility in working with cloud than previously imagined. Developers want full control and flexibility on tools and APIs to interact with cloud instead of being stopped at the gates and prescribed a uniform single gate to use cloud. Application owners want to allow this freedom but still want cloud workload to be managed, compliant, secure and optimized. This freedom and business driver for agility is creating a new way to reimagine cloud 2.0 which does not stop you at the gates but allows you to come in while continuously checking policies to ensure that you behave well in cloud. Ability to create and apply policies will play a key role in this new emerging model of governance where freedom is tied to responsibility. We believe that the next generation cloud operational plane will drive the future vision on how workloads will be deployed, operated, changed, secured, and monitored in clouds. Enterprises should embrace policies at all stages of software development lifecyle and operations for datacenters in cloud and on-prem. Creating, defining and evaluating policies and taking corrective actions based on policies will be a strategic enabler for all enterprises in the new cloud 2.0 world.

Defining Cloud Operational Plane

In this new cloud management world, you are not stopped at gates but checked continuously. Trust but verify is the new principle of governance in Cloud 2.0.  Now, let us review the 5 key areas for a cloud operational plane and how policies will play a critical role in governance.

  • Provisioning and deployment of cloud workload
    • Are my developers or app teams provisioning right instance types?
    • Is each app team using within their allocated quota of cloud resources?
    • Is the workload or configuration change being deployed secure and compliant?
    • How many pushes are going on per hour, daily and weekly?
    • Are any failing and why?
  • Configuration changes
    • Is this change approved?
    • Is it secure and compliant?
    • Tell me all the changes happening in my cloud?
    • Can I audit these changes to know who did what when?
    • How can I apply changes to my cloud configurations, resources, upgrade to new machine images etc.?
  • Security and compliance
    • Continuously verify that my cloud is security and compliant
    • Alert me on security or compliance events instantly or daily/weekly
    • Remediate these automatically or with my approval
  • Optimization
    • Are my resources most optimally being used? Does it have the right capacity? Do I have the scaling where I need it?
    • Showback of my resources
    • Tell me where am I wasting resources?
    • Tell me how I can cut down costs and waste?
  • Monitoring, state and health
    • Is my cloud workload healthy?
    • Tell me what are key monitoring events? Unhealthy events?
    • Remediate these automatically or with my approval

How Cloud Operational Plane can be enabled through Policies?

The following table compares the new and old world cloud management.  In the old world of cloud management platforms (CMP), we block without trust.  In the new world of cloud operational plane, since gates are open, it becomes necessary to manage the cloud through policies as the central tenet for cloud operations.  This is the cloud operational plane (COP).

  CMP – Block without Trust COP –Trust but Verify Recommended Practices
Deployment      
Deployment to multi-cloud Single API across all clouds, forced to use this.
Catalog driven provisioning
Various tools + No single point of control

No single API

No single Tool

Use best API/tool for each cloud

No catalog

DevOps – your choice

 

Manage/start/stop your resources Single tool Various tools + No single point of control DevOp/Cloud tool – your choice
DevOps continuous deployment Hard to integrate, API of CMP is a hindrance to adoption Embraces this flexibility, allow changes through any toolset Policies for DevOps process for compliance
Config Changes      
Unapproved config changes Block if not approved Allow usually or block if more control desired Change Policies
Config changes API Single API No single API DevOps tool
Audit config changes Yes Yes Audit – capture all changes
Rollback changes No Yes, advanced tools for Blue-Green, Canary etc. DevOps tool
Change monitoring No Yes Change Monitoring
Change security No Yes Policy for change compliance/security
Security & Compliance      
Security in DevOps process N/A Yes Policy for DevOps security
Monitor, scan for issues, get notified N/A Continuously monitor for compliance & security Multi-tool integrations
Prioritize issues N/A Yes, multiple manual and automated prioritization Policy based prioritization
Security and Compliance of middleware and databases

 

 

N/A Yes Compliance and security policies for middleware and databases
Optimization      
Quota & decommissioning Block deployment if out of quota

Decommission on lease expiry

Allow but notify or remove later with resource usage policies.

Decommission on lease expiry

Policies for quota and decommissiong
Optimization N/A Yes Policies for optimization and control

Policies in Enterprises

As enterprises move into a world of freedom and agility with Cloud and DevOps, it becomes increasingly important to use policies to manage cloud operations.  An illustrative diagram below shows how policies can be used to manage everything from DevOps process, on-prem and cloud environments, production environments, cloud infrastructure, applications, servers, middleware and databases.

Policies rule the world

For agile DevOps, policy checks can be embedded early or as needed in the process to catch compliance, security, cost or shift-left violations in source code and libraries. For example, consider a DevOps process starting with a continous integration (CI) tool such as Jenkins®. Developers and release managers can trigger the OWASP (Open Web Application and Security Project) checks to run a scan against source code libraries and block the pipeline if any insecure libraries are found.

Production environments have applications consisting of servers, middleware, databases and networks hosted in clouds such as AWS and Azure.  All these need to be governed by policies as shown above.  For example, RHEL servers in cloud are governed by 4 policies – cost control, patch policy, compliance policy and a vulnerability remediation policy.  Similarly there are security, compliance, scale and cost policies for other cloud resources such as databases and middleware.  Finally, the production environment itself is governed by change, access control and DR policies.

All these policies in the modern cloud 2.0 will be encoded as code.  A sample policy as code can be written in a language such as JSON or YAML:

  • If s3 bucket is open to public, then it is non-compliant.
  • If a firewall security group is open to public, then it is non-compliant.
  • If environment is DEV and instance type is m4.xlarge, then environment is non-compliant

Using policy-as-code will ensure that these policies are created, evaluated, managed and updated in a central place and all through APIs.  Additionally, enterprises will choose to remediate resources and processes on violation of certain policies to ensure that cost, security, compliance and changes are governed.

Recommendations

Cloud management is changing from a “block on entry” to “trust but verify” model.  Some enterprises who wish to govern with an absolute control at gates will continue to use cloud management platforms extensively and effectively.  However, many enterprises are beginning to move to a new cloud 2.0 model where agility and flexibility of DevOps tools and processes are critical for their success.  Instead of prescribing a single entry choke point or a single “CMP tool” to work with cloud, we allow everybody in with their own tools and processes, but continuously verify that policies for deployment, resource usage, quota, cost, security, compliance and changes are continuously tracked, monitored and corrected. Simple effective API based policy as code definition, management, evaluation and remediation will be a central capability that enterprises will need to run new clouds effectively.

Full disclosure:  I work for BMC Software and my team has built a cloud native policy SaaS service, check out the 2 minute video here: https://www.youtube.com/watch?v=hSFP5-kzbT0

Acknowledgement: A few of my colleagues at work, JT and Daniel proposed the analogy of forts and cloud operational plane that is fascinating and cool.  This motivated me to write this blog to show how cloud management itself is evolving from guard at gates to trust but verify model.