A few months back I had written a blog comparing mode 2 vs. mode 1 application development. Gartner defines BiModal IT as an organizational model that segments IT services into two categories based on application requirements, maturity and criticality. “Mode 1 is traditional, emphasizing scalability, efficiency, safety and accuracy. Mode 2 is non-sequential, emphasizing agility and speed”. In the last 6 months, we built a mode 2 cloud native application on AWS cloud and this got us thinking on two questions: What were unique enablers that led to a successful mode 2 cloud app? Is Gartner right in keeping the two mode 1/2 segments separate and what mode 1 can change and learn from mode 2? Let us analyze the first question here. We will discuss key best practices in building cloud native mode 2 applications.
Focus – do one thing – know your MVP – Mode 2 cloud native applications require a maniacal focus on customer problem, market requirements and delivering value to the customer by focussing on one or two use cases. We spent very little time in needless meetings, deciding what to build or how to build it. I have seen many projects that took 2-6 months to define the product to build. We knew exactly what to build with an uncompromising faith. Startups usually have this solid vision and focus of doing one thing right usually called the “Minimum Viable Product (MVP)”. We acted like a startup charged with solving a customer problem.
Leverage cloud native higher level services – We decided to go “all in” with AWS cloud using disruptive PaaS higher level services such as Kinesis, Lambda, API Gateway and NoSQL databases. We also went serverless with AWS Lambda as the key architectural microservice pattern while we used servers only when we absolutely needed. No more discussions about the “platform”, “lock-in”, portability to multiple clouds, or multiple datacenters – “on-prem” and “SaaS” or other utopian goals. We wanted to get to market fast and AWS managed higher level services was our best bet. Decisions such as which opensource to use, doing endless comparisons, discussions, debates, and paralysis by analysis were avoided at all costs. We also worked secretly avoiding cross organizational discussions that could slow us down and worked almost as skunkworks fully autonomous team charged with making our own decisions and destiny. After 6 months of using AWS, we continue to be delighted and amazed by the power of the platform. The undifferentiated heavy lifting is all done by AWS – infrastructure automation such as routing, message buses, load balancers, server monitoring, running clusters and servers, patching them, maintaining them, and so on – these are not our key business strengths. We want to focus on business problems that matter and focus on building “apps” not managing “infra” that is best left to AWS.
Practice everything “as code” concepts – From the kick-off, we followed “-as-code” concepts to define all our configuration, infrastructure, security and compliance and even operations. Each of these aspects is declaratively specified as code and version controlled in Git repository. We all have heard infrastructure-as-code. We followed this in our practices where we can build and deploy our entire application+infrastructure stack using AWS cloud formation (CFN) templates with the single click of a button. Even we can deploy customized stacks for dev, qa, perf and production all driven from a single CFN configured for each environment differently. This was a major achievement and a decision that saves us time in keeping all our environments consistent, and repeatable. Today, we have a CFN for each microservice and a few common infrastructure CFNs that not only describe the components but also configuration, such as amount of memory, security such as using HTTPS/SSL, IAM permission roles and compliance as code. Leveraging AWS ecosystem to do all this has vastly reduced our time to market as everything works seamlessly and is well integrated.
Security, operations and monitoring in cloud services – Being a SaaS cloud service, we leveraged security, operations, metering and monitoring framework from AWS cloud platform and ensured that architecture and development of microservices were built for SaaS. Security threat modeling and using standardized AWS blueprints for NIST helped us save time and achieve a standardized well tested security to start with such as VPCs, security groups and IAM permission model. Thinking about operations, health and monitoring while building SaaS components is a key new enlightenment we had as we moved into 3rd and 4th sprint. For example, log messages with any “Exceptions” or “unique patterns” can generate “alerts” that can be sent to email or sms. AWS Cloudwatch provides a powerful alerting and log aggregation capability to do all this and we used it extensively to monitor our appfrom log patterns and custom application metrics. Finally, most of our operations is “app ops” since we own and manage almost no servers, no databases, and no infrastructure. Being truly serverless has completely reduced an entire set of concerns related to infrastructure, and we are starting to see the benefits of it as we get into operations. Even our operations guys are talking about application issues and nothing about rebooting servers or patching for security updates.
DevOps, Automation and Agility Cool-aid – Just as our AWS “all in” architecture decision, in our first week, we picked our DevOps tool that a successful cloud company had used before and went “all in” with it. We built our pipelines, one per microservice with all the automation and stages from commits, dev, qa, staging and production. Once we built our pipelines, we still had “Red” all over our pipelines as pipelines failed in many automated tests. This is when we realized that the culture plays a very important part of our process of agile software development. Just having the right technology doesn’t cut it. We indoctrinated a new set of developer responsibilities – owning not just code but automation, pipeline and the operations of code in production. This mindset shift is absolutely essential in moving to agile software delivery where now we can push software changes to production a couple of times a week. We are, of course, not Facebook or Amazon yet but yet we have achieved success that would have been unimaginable in a traditional packaged software company. We are also following 1-2 week sprints to support this. The more frequent you deliver to production the less is the risk is what we are learning fast.
Pushing to production – Pushing to production is an activity that we stumbled a couple of times before learning how to get it done right. We have a goal of 30-60 minute production push that includes a quick discussion with developers, testers, product managers and release engineers about the release readiness followed by a push and canary and synthetic test, after which we decide whether to continue or rollback. Thinking about this is a journey for us and we continue to build operational know-how and supporting tools to help us do smoother deployments and an occasional rollback.
Think production when developing a feature – Push to production also plays a central part of new features and capabilities. Recently, we started thinking about how to rollout a major infrastructure change and a major schema change by considering that the production has huge amounts of data and all pushes require zero downtime. These considerations are now regularly a part of our every conversation about a new feature not just an after thought during a push to production. Production scale and resiliency are also other aspects to continuously watch for in building apps since the scale factor with increasing customers will be quite large compared to typical on-prem packaged apps. Resiliency is another critical consideration since every call, message, server, service or infrastructure component can potentially fail or become slow; apps will have to deal with this gracefully. Finally, we build applications that follow the 12 factor principles.
Cost angle – One of the areas which we started managing very closely is the cost of AWS services. We tracked daily and monthly bills and looked for ways to optimize our usage of AWS resources. Now, an architecture decision also requires a cost discussion which we never had when we were building enterprise packaged mode 1 applications. This is healthy for us as this also creates innovation in putting together neat solutions that optimize not just performance, scale, resiliency but also cost.
Startup small team thinking – Finally, we are a small two-three pizza team. Getting the complete ownership of not just software code, but also all decisions about individual microservices, tools and language choices (we use Node.js, Java and very soon Python) and doing the right thing for the product felt great and excited all on the team. The startup feeling while inside a traditional enterprise software firm gives the freedom for all developers and testers to innovate, try new things and get things done.
That’s it for now. We are on a journey to become cloud-native and are already are pushing features on a weekly and an occasional daily schedule into production without sacrificing stability, safety and security. In the second follow-up blog, I will cover the Gartner debate on how mode 2 learnings above can be leveraged in mode 1 applications to make traditional mode 1 applications more like cloud native mode 2 apps.