Oren Michels | Co-Founder & GM
April 11, 2012

No Fake Clouds: The Perils and Pitfalls of Scaling Modern API Management With Old School Software

 
This is the final in a series of three posts by Mashery co-founder and CEO Oren Michels that examines the advantages of using a genuine SaaS solution to manage your API, and the hidden (or obvious) costs and challenges in deploying API infrastructure using the model of appliances and licensed software. Part III examines the “risks” of deploying a SaaS solution with regard to security, scalability, latency and reliability. In Part II, we took a detailed look at the total cost of ownership (TCO) of a genuine multitenant SaaS API management solution compared with that of a traditional hardware/virtual appliance/licensed software model. It wasn't hard to come to the conclusion that the hidden costs of the traditional model overwhelm the software license fee, which is but the tip of the TCO iceberg. But cost isn't everything. The right choice for API management is the one that meets the enterprise's technical and business needs as well. Unless it wins there, even the most economical solution will not deliver the results needed to meaningfully grow the business with a secure, scalable, high-performance and reliable API program. Today we'll take a look at how a multitenant SaaS model compares to a traditional appliance/software model in these important criteria. But before we get there, let's consider another key factor: time to market. Presumably an enterprise is looking at an API infrastructure solution because of a compelling business need to make their services available to partners and developers, who will accelerate growth. With such a business imperative, waiting months for hardware to be purchased, networks to be provisioned and configured; software to be installed; integration to happen; redundancy and failover to be provided for; and the whole thing to be tested for function, stability and load capacity at significantly higher than anticipated traffic. While some of this can be reduced by deploying virtual machines on a public or private cloud, most of the time saved is likely to be consumed by more complexity in configuring the network, multiple geographies and failover. All told, this will often take months. Only then, after implementation is complete, can you begin learning and experimenting with how to configure it for your specific APIs. Thus begins another cycle of trial implementations and testing until it's finally ready to launch. In a multitenant SaaS API solution, all of this has already been completed. The platform, network, configuration and failover implementation has already been proven in high-traffic production operation. A new account can usually be provisioned in minutes, and a services team already knowledgeable and experienced in the platform can begin to configure your APIs. It's not left to you to suffer through the frustration of learning a new system. There’s no wondering why the version you have doesn't match the documentation, or whether the problems you are having are a result of not understanding the system, or of some obscure configuration being off. You worry about the important part - how to launch your API program right away, since it will likely be ready to go in a matter of a week or two...or less. Along the same lines as time to market is the need for flexibility and agility. Seriously - we're in the API management business; we should insist that important features to be available through a RESTful API so you can do whatever integrations you need and offer easy adaptations to identity or entitlement systems. A well-architected, multitenant SaaS platform, like Mashery's, includes a wide range of APIs, so administrators can automate management tasks and share data with other systems. Security Ah yes, the favorite of SaaS FUD. But let's take a look at the example given: PCI DSS certification. With a software or virtual appliance solution, you can implement your software or hardware in an environment that has been certified as PCI compliant, but you then need to update your procedures to include this new equipment or software, and add it to your audit. This takes time and money. With a SaaS solution that is fully certified as PCI DSS compliant and has a third-party Report on Compliance (ROC) to prove it, you can rely on the ROC for certification and have nothing to add to your own audit or procedures. And lets not forget, that EDI VANs, service brokers, integration hubs and the like have all had to prove security controls for a long, long time - even before on-premise XML Gateways existed. In addition, with the appliance model you are faced with the responsibility and cost of making sure that you are always on the latest version, with applying all the patches and upgrades needed to do so and with the uncertainty of how it might interact with your underlying infrastructure. More cost, more disruption. More risk of getting it wrong, or leaving a critical hole unpatched. In cases where security requirements dictate that data must stay in the data center or be specially encrypted before it leaves, a local or virtualized traffic manager can be deployed inside the data center (as described below). Performance This is where a modern hybrid system really shows its advantage. In this day and age, few enterprises have all their data in a single place. Some is in a data center they control, some is in a private cloud, and some is distributed among a range of cloud providers. Today's successful IT organization optimizes its architecture around each kind of data or service it delivers, and an API management platform that doesn't adapt to this - and doesn't provide a single means of controlling ALL APIs across the organization - is going to fall short. Thus the hybrid model - such as Mashery Cloud and Mashery Local - is the only architecture flexible enough to accommodate today's hybrid IT architecture. Your ultimate goal is to make the response time to the user as short as possible. You do this by getting the data as close to the application that's consuming the data as you can, and by making the path to that data as direct as possible. In the cloud, using Mashery's global API Distribution NetworkTM, you can selectively cache API data close to your user, vastly reducing the latency inherent in a solution deployed in only one location. And for internal or security-sensitive applications, you can seamlessly add Mashery Local where needed while retaining the benefits of the cloud. No enterprise has all of their API data in a single place. Want to add geographic routing rules that make sure each call goes through the optimal traffic management location, whether cloud or on-premise? Just ask. It can be set up the same day. Scalability And what of the API program that starts modestly and suddenly takes off? Ask klout, whose API has grown from minimal usage to over 10 billion queries per month by over 3,500 partners in less than a year. Klout was able to handle the growth while focusing on their core business, knowing that their API infrastructure could scale to meet the need. Hardware and virtual appliance/software vendors love to talk about scalability, and why wouldn't they? When you need to scale, you do it by buying more of their boxes or software licenses, revenue that they can recognize immediately, and a cost that, once incurred, can't be scaled back down after a demand spike happens. Does an e-commerce company really need the capacity necessary for Black Friday and Cyber Monday to be provisioned and paid for year-round? Obviously it is not economical to buy, maintain and configure sufficient infrastructure to meet your highest anticipated traffic spike, and few companies want to do that. But when that spike comes, if you can't handle the load, you will be faced with a slew of lousy options: Option A - restrict the usage of your services (and therefore restrict the revenue they bring); Option B - temporarily route some or all API traffic around your management platform and go without security, rules and policy management, and analytics at the time you need them most; or Option C - scramble to purchase and configure additional infrastructure at premium prices to try and stay a step ahead of the growing traffic. Naturally, you need to trust that your SaaS provider can handle this sort of traffic through their network. Beware of solutions that claim that their "technology" handles a high volume of traffic while failing to explain how small a fraction of that traffic actually runs through their cloud-based network. Reliability Another favorite in the annals of SaaS FUD. "What if it goes down and you aren't in control?" "Can you really trust it to stay up?" As Amazon CTO Werner Vogels says, "everything goes down." There are many reasons that outages occur. Sometimes it is the network. Sometimes it is a bug in the software. Sometimes it is a bad configuration. And on and on… For each of these kinds of outages, consider how they are handled in a multitenant SaaS environment as opposed to an appliance/software version. Network/Hardware/Connectivity Outage - Unless you have purchased and configured multiple appliances or licenses and deployed them in multiple geographies across multiple redundant networks, configured everything for automatic failover and tested the failover in a production situation, it's unlikely that you have nearly as strong a failover capability as a well-run SaaS provider. And if you do, you have invested an incredible amount of time and money to get there. So if something goes down, it will be up to you to bring everything back up, potentially reinstall or reconfigure everything, and come back online. And unless your ops team is skilled in all the various quirks of the software, it is more than likely that there will be some hiccup along the way that they'll need support on. Not to mention that if there is a significant network or power outage at your datacenter, your team will likely be working on getting a lot of other systems back up and running, and API infrastructure is unlikely to be the first thing they work on. This means a longer outage, and during that time period anyone calling your API is likely waiting for a long timeout rather than failing gracefully with an error message or, in some cases, successfully handling a portion of the traffic with cached content. Software bugs, configurations and unanticipated uses - Everyone's software has bugs - even ours. So when something stops working, especially if the outage is limited to one customer or use case, any partial or complete downtime will be significantly shorter if the people managing the implementation are the same ones who wrote the software. When there is a problem, our engineers can look at overall system behavior across multiple clients to quickly find the root cause and either fix it immediately or work around it until a proper fix can be pushed. If it is a problem with configuration, those same engineers are great resources to help the ops team determine what needs to be changed. In Mashery's multitenant SaaS environment, our ops team monitors all customers' APIs 24/7/365 and begins working on any problem the moment an alarm goes off. In appliance/software land, your ops team first needs to rule out a lot of issues - hardware, configuration, whether you have all the latest updates and patches, etc. - before you can begin looking at other causes. And, of course, in the multitenant world, chances are that a particular issue reared its head for someone else before it applied to you, and a fix was pushed to production then - so you won't have the outage in the first place. With all the advantages of genuine multitenant SaaS, you'd think it would still be a great value at double or triple the cost of a traditional hardware/virtual appliance/software implementation. The fact that you can have a multitenant SaaS solution sooner, at a lower initial cost, and at a lower TCO than the old-school solution is a remarkable thing indeed. I like nostalgia as much as anyone, and I certainly prefer to drink old wine rather than the latest release. Give me an '89 Haut Brion or a '94 Harlan Estate over anything from this century anytime. But when it comes to the critical platform needed to run something as critical to an enterprise as the APIs its partners will be using to fuel growth, old school just doesn't measure up. ----- Read Part 1 | Read Part 2