As more and more applications are provided as a “software as a service” and Web 2.0 starts to yield its riches one must wonder whether there is a possibility of overshoot. Is there a chance that increasing the cloud may simply leave us fogbound.
As I sit here tapping away on a web based word processing application portaled into a Blog site I cannot in earnest say that applications in the Cloud is a bad idea. It’s just one that must have the light of scrutiny applied. The world is full of competing paradigms. The question is not whether one is good or bad but simply better for a given situation. This blog is about how the cloud is constructed and starts with a seperation of the client, the ISP, and then the rest… the internet.
Virtualisation is a solution for the consolidation of hardware to provide many applications. Grid computing is the distribution of hardware to provide support for a single application. One is not better than the other, they are simply used to solve different problems
As I come to terms with the mental gymnastics of Virtual computer systems and its abstractions, or its antithesis Grid computing and its abstractions I’m often astounded how the internet itself is becoming abstracted as an application. Is the whole shooting match moving to Layer 7?
A browser is a typical desktop application, a web proxy is a typical Cloud application. Do you need a Cloud application for it all to work? No! Why have it? to save money speed or introduce controls. OK, so far it’s a good idea. So we have the diagram as shown with our new web proxy in place.
The agenda for the ISP is to maximise the benefit of their newly acquired cache to as many people as possible. This will maximise the cache hits and therefore optimise their connection to the net so that they pay less and can pass on the savings to their customers making their service cheaper and more appealing.
So having engaged more customers their net starts to resemble the next diagram. The problem now becomes one of load. The capacity for one system to accommodate this load becomes unsustainable and the performance of the network degrades and the unhappy customers depart.
The cache engine of the internet (Squid) then provides for cache farm capabilities. Each of the systems communicate with each other to help retrieve previously fetched objects and the “peer” nature of these systems makes it all look like one big system. This then looks like the next diagram with the cache peered proxy farm.
The next requirement is to add function. The cache peered system works fine if all users have the same requirements. If the same filtering policy applies for instance there is no problem. Many sites however like a policy that is unique to them, having a “farm” of systems identify a unique customer source is doable but will depend upon a source IP range or (better still) some form of authentication, which then requires a degree of knowledge about the users that exist at a site and propagation of user state across systems.
This user awareness (the identity) may be through LDAP synchronisation or manually crafted files that populate the database. This may requires a trusted connection from the ISP back into the site, or the traversal of specially encrypted or (eek) cleartext username/password files. But there is something else missing here… reporting to the customer. Extraction of information from many systems is best done through log consolidation. This is not as trivial as it sounds but, if it is a requirement, it can be met.
Tiers of management
Some Agency, education for instance, may require this of the environment. It is possible for the agency to maintain the logging repository and issue canned reports back into the clients. This topology is represented in the “Cache peer and reporting” diagram. All this is created to avoid end users themselves needing to maintain such systems for themselves. The complexity of the central solution is high in order to keep the complexity to the end users low.
There is another thought. Moore’s Law states that computing power doubles every two years. Given a static sized fleet and a static capacity broadband and no increase in the filtering requirements it means that the filtering engine can track the requirements of the fleet. More realistically though… If the Broadband capacity doubles (or quadruples usually), the number in the fleet doubles (1:1 ratio policies and taxpayers $ at work), the filtering requirement increases to demand a higher load on the filtering system (assumed double). This all translates to a demand that is cubic, not linear. If you had 10 systems in the cloud you will need 1000 in two years. Now given that there are factors that will mitigate the demand (code efficiency, better standards etc).
OK So where is all this heading? It seems logical to me that a distributed computing model is more appropriate to the problem of internet “management” for two reasons:
- It disseminates the control processes to different regions/demographics
- It spreads the computing load.
So this model advocates the deployment of control systems at the client network. What happens at the centralised process where accountability is the key. And this is where a hybrid cloud model appears. Placing devices at the client with their own logging and reporting capability and yet further logging and reporting back in the cloud for monitoring the behaviour of the whole.
I really like a simple analogy:
When we discovered the internet neighbourhood wasn’t safe we put big gates at the end of the street and gave everyone a key. This was a complete pain in the arse, people left the gate open, the gap was sometimes too narrow, and sometimes it was simpler to simply leave it open. Better for everyone to have their own garage door