If you are building a web app and care about the performance of that app (something you need to care about if you want to be successful), then you’ve thought at least a little bit about how and where you are going to deploy the app once it’s ready to go live. This is a quick comparison of outsourced hosting (to someone like Rackspace) vs doing it yourself. It’s meant for a quasi technical audience and for people who care about performance but may not have a lot of experience with the vagaries of telecom, datacenters, routers, switches, ISP, etc.. The core point is that if you care about performance and if your business is dependent on scale to be successful (most web businesses are), you need to carefully consider hosting your own servers. I strongly believe that the amount of time a user allocates for your site is finite and hard, but not impossible, to increase. There are definitely exceptions and there is no doubt that a great application or great content does increase the users willingness to spend time on your site. But easier than increasing the amount of time a user is going to give your app is to increase the amount they can get done in the time they are allotting. There’s plenty of existing research that discusses how users react to a slow site but you can summarize it easily: “badly”. The more rapidly a site performs (reacts to user clicks), the more the user stays engaged with the product and the more they can accomplish in the time they’ve allotted. This about your own experience with a brand new super fast Mac or PC vs an a three or four year old computer. Perhaps most important is the initial impression that your app make on your users. As the old saying goes – “you only get one chance to make a first impression”. On the web that means having a great product – and all of the complexities therein – that loads really really fast. Think of how fast Google loads? They’ve got an advantage in that they are loading a largely blank page (small number of kb) but nearly anywhere in the world, the page loads in a few hundred milliseconds. Remember Friendster and the pages that did not load? Ok – that’s as bad as it gets but we users are surprisingly sensitive to the difference between a web site that loads pages in under a second per page and the vast majority of sites where the page loads in multiple seconds. If you care about performance, then one of the bigger decisions you will make – it not a day one then eventually – is whether or not you to host your servers yourself (not at your office but in a collocation facility) or to buy slices of computing power or even virtual servers from the likes of Amazon or Rackspace. By the way, don’t host your servers at your office. That’s not what I’m talking about here when I say “hosting yourself”.
Outsourced Hosting (Rackspace, Pair Networks, Amazon Web Services, etc)
- You buy servers or units of performance (RAM, hard disk, processor performance)
- Vendor worries about data center functions including power allocation, cooling, internet (ISP) bandwidth allocation
- Vendor does all the IT work – they install the OS, install the machines in the rack, connect the machines to the internet, etc.
- For $5,600/yr Rackspace will sell you the power of 2 processors, 8GB of RAM, and redundant (RAID1) 10K RPM 73GB hard disk space plus 24TB of data transfer/yr.
- You focus on writing software and building your application, content or service instead of worrying about pesky little things like hardware, ISPs, routers, and switches.
Managing your own hosting
- You don’t install the servers at your office serving off a cable modem, T1, or DSL line.
- If you were planning on doing that, call someone like Rackspace or Amazon instead and learn about the wonders of virtual hosting. ASAP.
- You find a collocation provider like Level 3, Internap, Equinix or Switch and Data.
- A collocation provider is someone who runs a secure building – or a space within a building – where they have pre-arranged to have cooling, air conditioning, lots of power from the electric company (”the grid”) and power redundancy via batteries and diesel fuel generators. These are sometimes called data centers, POPs (points of presence) or telecom hotels if you are telco person. These are not the same as phone company (local monopoly phone companies like Verizon or Pacific Bell) central offices. In New York City, buildings that are primarily or entirely used for this purpose include 60 Hudson and 111 8th Ave.
- The collocation provider sells you rack space (a full rack, a half rack, multiple racks, etc.) and power.
- They cool and secure the facility and insure (at least theoretically – I’ve got a good story about how the uninterruptable power supply at Switch and Data NYC failed during big East Coast blackout a few years ago and took 10s of my servers down with it) that the power stays on no matter what happens.
- You buy bandwidth from the internet service provider (ISP) at the collocation center
- Typically you buy a given pipe size (say a Gigabit Ethernet connection or a 100Megabit/second (Mbps) Ethernet connection) and you can burst traffic up to the size of the entire pipe but only pay for what you use on average. There are different ways of measuring the average utilization but most use something called “95% th percentile measurement”. You pay some amount per month for the cost of the pipe – independent of how much of the pipe you use – and included in that price is some amount of average utilization. If you use more than that average, you pay more – typically at a price/Mbps
- Depending on the collocation provider, you either buy the ISP (internet bandwidth) service from the collocation provider directly or from one of the ISPs they have “on-net” (which simply means that the ISP is located in the same facility and is easy to connect to). Unless you are running a huge application with massive traffic (think YouTube, Google, MSN, the New York Times) you buy your internet bandwidth from one provider and let them take care of the redundancy requirements.
- You buy your own servers
- And put them in the rack and connect them to the power that the Collocation facility sells you. You can get whatever kind of server you want with whatever ratio of CPU:hard disk:RAM you want. More on this in a few bullets.
- You buy your own switch or router.
- You connect your servers to the switch/router and then connect the router to the ISP that you’ve bought bandwidth from. You have to configure the router. You likely configure the router with more than one physical connection to the ISP so that if the connection fails (the wire gets cut or the port on your router or theirs fails) you are not out of business.
- The days of exponential innovation in routers/switches is behind us. While you can’t get fired for buying Cisco, there are other great options – less expensive as well – from vendors like Force10 and HP. You’ll want an experienced – and pragmatic - network engineer to help you figure out what to buy so that you don’t massively overspend. You can waste a lot of money on hardware you won’t have any need for for years to come (unless you rapidly become the next YouTube). While you won’t be able to use a $300 Linksys to connect to the switch/router of the ISP you are using, in all likelihood you’d be hard pressed to use more bandwidth than that Linksys can handle. Non blocking gigabit Ethernet switches were modern and advanced five or six years ago. Today you buy them for a few hundred bucks. They’ve made office LANs dramatically faster, have enabled the easy deployment of VoIP based PBXs and in general are no brainer plug and play. Stay tuned for a future post on startup IT made simple where i’ll discuss what you need to get started in IT for a startup with anywhere from 3 to 200 people and how to insure you don’t waste a lot of money on stuff you don’t need but that IT vendors want to sell you.
- Cost to get started: Not including the cost of the servers and switch/router you will use, the cost to get in the game here is on the order of 1,000 to 1,500 dollars/month depending on what part of the country you are trying to locate your servers in.
- Collocation space in locations with expensive power and expensive real estate (e.g. Manhattan or Silicon Valley) costs more than in places where real estate is nearly free and power cheap bc of a nearby nuclear power plant or hydro-electric facility. It’s no an accident that the biggest Google and Microsoft data centers are not in densely populated places and are typically near a large dam.
- You manage all of the IT issues yourself
- You configure the servers, the switch/router, the OS, etc.
- If you have a hardware failure, you deal with it yourself
- You control performance – this is the MOST important reason to do it yourself. Performant sites are not performant by accident
- You don’t share your servers with anyone
- You control – for the most part – the quality of the internet bandwidth your servers are seeing. The ISP is still a bit of a black box but it can be measured pretty easily and there are good vendors like Internap that provide a performant service and do it well.
- Most importantly: You can control the ratio between CPU and RAM. Modern CPUs are astoundingly fast and are often not the bottleneck in applications. The speed at which data (content) gets to the CPU is often the bottleneck and being able to take advantage of the revolutionary RAM density that you can put into a rack of storage is something very special and one of the most fascinating aspects of modern application development.
- You know how people tell you to add more RAM to your PC to make it run faster? Well, the same concept applies to your servers and the applications they run on. It’s not trivial to take advantage of it but RAM is orders of magnitude faster than hard disk (or even SSD) so content delivered to an end user from a server that is primarily using RAM is going to be dramatically faster than one that is dependent on hard disks. The laws of physics are strict about that. And while it’s not cheap (figure on the order of $200K), you can now get nearly a Terrabyte of RAM into a single rack.
- The likes of Amazon and Rackspace simply cannot afford to give you the prices they do and keep your data primarily in RAM. Disk is orders of magnitude cheaper; RAM is, all in, about $150/GB in a server and hard disk – even using RAID1 for redundancy – is on the order of a $1/GB.
- You can take advantage of faster large storage technologies like 15K RPM hard disks or solid state disks (SSDs) and allocate them in whatever ratio you want. The economic decisions are your own.
- You can take advantage of a content delivery network (CDN) like Akamai (this is not necessarily unique to doing it yourself)
- None of the above will fix poorly written software. If you don’t write software that can make use of the hardware, you won’t be performant.
- Lastly, the economics scale well. If you have your own rack, at first your servers will be incredibly lightly loaded and relative to outsourcing, you will be paying top dollar. BUT, your site will scream and the first users who come will have an amazing experience (remember that first impression thing). If your site is successful, that one rack will go a very very long way and depending on your application, will serve 100s of thousands or even millions of unique customers per year. It will likely be a very long time before you upgrade anything in it. Assuming you chose a business that makes money (it’s not a business unless it makes money) the cost of the hardware and the rack will be irrelevant to you – a real rounding error relative to the profit they generate. This is not true if you chose to build the next YouTube in which case your serving costs will continue to scale massively as you increase your users.
So should I outsource or do it myself? By looking at the comparison above, it would seem that outsourcing it is the obvious choice – doing it yourself seems incredibly complicated. Truth is, the likes of Rackspace have made it much easier to buy service from them than the collocation providers have and simplicity of service and purchase is what they are selling you. The underlying elements – computer hardware, power, ISP, cooling, real estate – are all commodities which are sold at low margins. Rackspace sells their combined service to you at a a better margin (for them) bc of all of that service that they are selling you. How big a margin they make depends on how big or small you are. If you are running a blog – like this one – that gets 10s or even 100s of thousands of page views/yr, you can’t possibly justify the cost and complexity of managing it yourself. But if you are running a site that needs to be at reasonable scale to be successful – and most web sites can’t hope to be profitable without significant scale – and needs performance as a competitive differentiator, you need to consider this insourcing – and using lots of RAM – very closely. Given my own experience with mid-scale data center installations at Epana and our collective performance obsession at Oyster, we decided to make the rather large upfront investment and manage our own servers. We use Force10 switches, Internap collocation and ISP services, servers from Rackable (now SGI) and a ton of RAM (we have 100,000+ photos on the site) to get a site that is quite fast depsite being photo intensive – for example, the Oyster Review of the Kahala Hotel and Resort, Oahu, Hawaii has 600+ photos.