High Scalability

Syndicate content
Updated: 59 min 3 sec ago

What would you like to ask Justin.tv?

Thu, 03/11/2010 - 9:40am

It looks like I'll have the chance to interview someone tomorrow from Justin.tv about their architecture, which is pretty exciting given their leadership role in live broadcasting. They get 30 million uniques a month, can handle 1 million simultaneous broadcasts and hope to grow another magnitude in the near future. That must take some doing.

Here's your opportunity, especially if you think my questions suck, to ask your own sucky questions :-) What would you like to know about Justin.tv?

Categories: Networking Feed

Saying Yes to NoSQL; Going Steady with Cassandra at Digg

Wed, 03/10/2010 - 5:42pm

The last six months have been exciting for Digg's engineering team. We're working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we're also rolling out a new client and server architecture. And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP.

Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who's been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.

What's Wrong with MySQL?

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.

Relational database technology can be a blunt instrument and we're motivated to find a tool that matches our specific needs closely. Our domain area, news, doesn't exact strict consistency requirements, so (according to Brewer's theorem) relaxing this allows gains in availability and partition tolerance (i.e. operations completing, even in degraded system states). We're confident that our engineers can implement application level consistency controls much more efficiently than MySQL does generically.

As our system grows, it's important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.

Read the Full Digg Blog

 

Categories: Networking Feed

How FarmVille Scales - The Follow-up

Wed, 03/10/2010 - 7:53am

Several readers had follow-up questions in response to How FarmVille Scales to Harvest 75 Million Players a Month. Here are Luke's response to those questions (and a few of mine).

How does social networking makes things easier or harder?

The primary interesting aspect of social networking games is how you wind up with a graph of connected users who need to be access each other's data on a frequent basis. This makes the overall dataset difficult if not impossible to partition.

What are examples of the Facebook calls you try to avoid and how they impact game play?

We can make a call for facebook friend data to retrieve information about your friends playing the game. Normally, we show a friend ladder at the bottom of the game that shows friend information, including name and facebook photo. 

Can you say where your cache is, what form it takes, and how much cached there is? Do you have a peering relationship with Facebook, as one might expect at that bandwidth?
Categories: Networking Feed

Sponsored Post: Job Openings - Squarespace

Tue, 03/09/2010 - 8:16am
Squarespace Looking for Full-time Scaling Expert

Interested in helping a cutting-edge, high-growth startup scale? Squarespace, which was profiled here last year in Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month and also hosts this blog, is currently in the market for a crack scalability engineer to help build out its cloud infrastructure. Squarespace is very excited about finding a full-time scaling expert.

Interested applicants should go to http://www.squarespace.com/jobs-software-engineer for more information.



If you would like to advertise your critical, hard to fill job opeinings on HighScalability, please contact us and we'll get it setup for you.

Categories: Networking Feed

Applications as Virtual States

Tue, 03/09/2010 - 8:07am

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

As I was writing an article on the architecture of the Storm Botnet, I couldn't help but notice the deep similarity of how Storm works and changes we're seeing in the evolution of political systems. In particular, the rise of the virtual-state. As crazy as this may sound, I think this is also the direction applications will need follow to survive in a complex world of billions of compute devices.

You may have already heard of virtual corporations. Virtual corporations are companies with limited office space, a distributed workforce, and production facilities located wherever it is profitable to locate them. The idea is to stay lean and compete using the rapid development and introduction of new products into high value-added markets. If you spot a market opportunity with a small time window, building your own factories and hiring and engineering team simply isn't an option. Building factories is a bit old fashioned and is left to the select few. These days you get an idea for a product and contract out everything else you possibly can. It doesn't really matter where you are located or where any of your partners are located. If part of your product requires a specialized microprocessor, for example, you'll contract out the R&D and the design. The manufacture will be contracted out to a virtual fab, then the chip will be sent to a contract manufacturing service for integration. Look ma, no hands.

Futurists say land doesn't matter anymore. Nations don't matter anymore. Entire relationships are abstractly represented by flows of money, contracts, information, and products between all these different agents. Interestingly enough, what technology is the absolute master of managing flows? Applications! But we are getting ahead of ourselves here.

Categories: Networking Feed

Strategy: Planning for a Power Outage Google Style

Fri, 03/05/2010 - 8:11am

We can all learn from problems. The Google App Engine team has created a teachable moment through a remarkably honest and forthcoming post-mortem for February 24th, 2010 outage post, chronicling in elaborate detail a power outage that took down Google App Engine for a few hours.

The world is ending! The cloud is unreliable! Jump ship! Not. This is not evidence that the cloud is a beautiful, powerful and unsinkable ship that goes down on its maiden voyage. Stuff happens, no matter how well you prepare. If you think private datacenters don't go down, well, then I have some rearangeable deck chairs to sell you. The goal is to keep improving and minimizing those failure windows. From that perspective there is a lot to learn from the problems the Google App Engine team encountered and how they plan to fix them.

Please read the article for all the juicy details, but here's what struck me as key:

Categories: Networking Feed

How MySpace Tested Their Live Site with 1 Million Concurrent Users

Thu, 03/04/2010 - 8:50am

This is a guest post by Dan Bartow, VP of SOASTA, talking about how they pelted MySpace with 1 million concurrent users using 800 EC2 instances. I thought this was an interesting story because: that's a lot of users, it takes big cajones to test your live site like that, and not everything worked out quite as expected. I'd like to thank Dan for taking the time to write and share this article.

In December of 2009 MySpace launched a new wave of streaming music video offerings in New Zealand, building on the previous success of MySpace music.  These new features included the ability to watch music videos, search for artist’s videos, create lists of favorites, and more. The anticipated load increase from a feature like this on a popular site like MySpace is huge, and they wanted to test these features before making them live. 

If you manage the infrastructure that sits behind a high traffic application you don’t want any surprises.  You want to understand your breaking points, define your capacity thresholds, and know how to react when those thresholds are exceeded.  Testing the production infrastructure with actual anticipated load levels is the only way to understand how things will behave when peak traffic arrives. 

For MySpace, the goal was to test an additional 1 million concurrent users on their live site stressing the new video features.  The key word here is ‘concurrent’.  Not over the course of an hour or day… 1 million users concurrently active on the site. It should be noted that 1 million virtual users are only a portion of what MySpace typically has on the site during its peaks.  They wanted to supplement the live traffic with test traffic to get an idea of the overall performance impact of the new launch on the entire infrastructure.  This requires a massive amount of load generation capability, which is where cloud computing comes into play. To do this testing, MySpace worked with SOASTA to use the cloud as a load generation platform. 

Here are the details of the load that was generated during testing.

Categories: Networking Feed

Hot Scalability Links for March 3, 2010

Wed, 03/03/2010 - 9:44am
  • Getting Real about NoSQL and the SQL-Isn't-Scalable Lie by Dennis Forbes. Buoyed by Canada's Olympic success, Dennis is going for the gold in that least real of sports, the NoSQL vs SQL pursuit.
  • Design Patterns for Distributed Non-Relational Databases by Todd Lipcon. Great coverage of consistent hashing, consitency models, data models, storage layouts, log-structured merge trees, and gossip protocols.
  • Brewer's CAP Conjecture is False. Jim Starkey makes the case the CAP is crap.
  • Kaazing Pushes Web Sockets to Make Browsers Real Time. Bi-directional communication comes to the web, but shouldn't sockets be able to accept connections too?
  • 4 Months with Cassandra, a love story. Cloudkick likes its Linear scalability, Massive write performance, Low operational costs. We'll likely keep moving more data into Cassandra as we need to, but for some data the ability to write arbitrary SQL queries is still very useful
  • Categories: Networking Feed

    Using the Ambient Cloud as an Application Runtime

    Tue, 03/02/2010 - 9:34am
    This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

    The future looks many, big, complex, and adaptive:

    1. Many clouds.
    2. Many servers.
    3. Many operating systems.
    4. Many languages.
    5. Many storage services.
    6. Many database services.
    7. Many software services.
    8. Many adjunct human networks (like Mechanical Turk).
    9. Many fast interconnects.
    10. Many CDNs.
    11. Many cache memory pools.
    12. Many application profiles (simple request-response, live streaming, computationally complex, sensor driven, memory intensive, storage intensive, monolithic, decomposable, etc).
    13. Many legal jurisdictions. Don't want to perform a function on Patriot Act "protected" systems then move the function elsewhere.
    14. Many SLAs.
    15. Many data driven pricing policies that like airplane pricing algorithms will price "seats" to maximize profit using multi-variate time sensitive pricing models.
    16. Many competitive products. The need to defend your territory never seems to go away. Though what will map to scent-marking I'm not sure.
    17. Many and evolving resource gradients.
    18. Big concurrency. Everyone and everything is a potential source of real-time data that needs to processed in parallel to be processed at all within tolerable latencies.
    19. Big redundancy. Redundant nodes in an unpredictable world will provide cover for component failures and workers to take over when another fails.
    20. Big crushing transient traffic spikes as new mega worldwide social networks rapidly shift their collective attention from new shiny thing to new shiny thing.
    21. Big increases in application complexity to keep streams synchronized acrosss networks. Event handling will go off the charts as networks grow larger and denser and intelligent behaviour attaches to billions of events generated per second.
    22. Big data. Sources and amounts of historical and real-time data are increasing at increasing rates.

    This challenging, energetic, ever changing world is a very different looking world than today. It's as if Bambi was dropped into the middle of a Velociraptor pack.

    Categories: Networking Feed

    MySQL and Memcached: End of an Era?

    Fri, 02/26/2010 - 10:06am

    If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.

    This was a pre-cloud, relational database dominated world, built from parts scrounged from the remnants of enterprises and datacenters past. Twitter and Digg started in this era, but are evolving into something different, as scaling pressures increase and new purpose built technologies pop into being.

    With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era.

    Categories: Networking Feed

    Paper: High Performance Scalable Data Stores

    Thu, 02/25/2010 - 8:58am

    The world of scalable databases is not a simple one. They come in every race, creed, and color. Rick Cattell has brought some harmony to that world by publishing High Performance Scalable Data Stores, a nicely detailed one stop shop paper comparing scalable databases soley on the content of their character. Ironically, the first step in that evaluation is dividing the world into four groups:

    • Key-value stores: Redis, Scalaris, Voldmort, and Riak.
    • Document stores: Couch DB, MongoDB, and SimpleDB.
    • Record stores: BigTable, HBase, HyperTable, and Cassandra.
    • Scalable RDBMSs: MySQL Cluster, ScaleDB, Drizzle, and VoltDB.

    The paper describes each system and then compares them on the dimensions of Concurrency Control, Data Storage Replication, Transaction Model, General Comments, Maturity, K-hits, License Language.

    And the winner is: there are no winners. Yet. Rick concludes by pointing to a great convergence:

    I believe that a few of these systems will gain critical mass and key players, and will pull away from the others by next year.  At that point, open source contributors will likely migrate to those players.

    From the paper:

     

    Categories: Networking Feed

    Hot Scalability Links for February 24, 2010

    Wed, 02/24/2010 - 9:57am
  • Cassandra @ Twitter: An Interview with Ryan King. Great interview by Alex Popescu on Twitter's thought process for switching to Cassandra. Twitter chose Cassandra because it had more big system features out of the box. Is that Cassandra FTW?
  • I Had Downtime Today. Here’s What I’m Doing About It by Patrick McKenzie. Awesome deep dive into went wrong with Bingo Card Creator. Sh*t happens. How do you design a process to help prevent it from happening and how do you deal with problems with integrity when they do?
  • High Availability Principle : Request Queueing by Ashish Soni. Queue request to ride out traffic spikes: 1) Request Queuing allows your system to operate at optimal throughput. 2) Your users only experience linear degradation versus exponential degradation. 3) Your system experiences NO degradation.
  • pfffft twatter tweeter by Knowbuddy. The reason you should care [about NoSQL] is because now you have more options--you're not stuck trying to wedge your system into a relational model if you don't want to. And isn't /. all about freedom of choice?
  • Wordpress, Varnish and Edge Side Includes. Using Varnish to go from .63 requests per second to 537.44 requests per second.
  • Facebook’s Petabyte Scale Data Warehouse using Hive and Hadoop by Ashish Thusoo and Namit Jain. How does Facebook deal with 12 TB of compressed new data everyday? They get a bad case of the Hives.
  • Categories: Networking Feed

    Sponsored Post: Job Openings - Squarespace

    Tue, 02/23/2010 - 8:19am

    There was a bit of drama earlier when I posted a free job opening for Zynga. It caused unfortunate and just plain wrong accusations. It also caused a number of requests for more free job posts, which I should have anticipated, but obviously I can't let this blog become cluttered with that kind of stuff. Earlier I tried a job board type service, but that never really worked out. So what to do? Someone suggested a sponsored post approach and I think that's a good compromise. It minimizes the noise, let's people know about work, and brings in a little revenue. It works like an advertisement. If you are interested please let me know. When we have any job openings there will be a sponsored post like this one, that you can easily ignore or pay attention to, depending on your situation.

    Squarespace Looking for Full-time Scaling Expert

    Interested in helping a cutting-edge, high-growth startup scale? Squarespace, which was profiled here last year in Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month and also hosts this blog, is currently in the market for a crack scalability engineer to help build out its cloud infrastructure. Squarespace is very excited about finding a full-time scaling expert.

    Interested applicants should go to http://www.squarespace.com/jobs-software-engineer for more information.

    Categories: Networking Feed

    When to migrate your database?

    Tue, 02/23/2010 - 4:25am

    Why migrate your database? Efficiency and availability problems are harming your business – reports are out of date, your batch processing window is nearing its limits, outages (unplanned/planned) frequently halt work. Database consolidation – remove the costs that result from a heterogeneous database environment (DBAs time, database vendor pricing, database versions, hardware, OSs, patches, upgrades etc.). OK, so the driving forces for migration are clear,  what now?

    Read more on BigDataMatters.com

    Categories: Networking Feed

    Twitter’s Plan to Analyze 100 Billion Tweets

    Fri, 02/19/2010 - 8:41am

    If Twitter is the “nervous system of the web” as some people think, then what is the brain that makes sense of all those signals (tweets) from the nervous system? That brain is the Twitter Analytics System and Kevin Weil, as Analytics Lead at Twitter, is the homunculus within in charge of figuring out what those over 100 billion tweets (approximately the number of neurons in the human brain) mean.

    Twitter has only 10% of the expected 100 billion tweets now, but a good brain always plans ahead. Kevin gave a talk, Hadoop and Protocol Buffers at Twitter, at the Hadoop Meetup, explaining how Twitter plans to use all that data to an answer key business questions.

    What type of questions is Twitter interested in answering? Questions that help them better understand Twitter. Questions like:

    Categories: Networking Feed

    Seven Signs You May Need a NoSQL Database

    Tue, 02/16/2010 - 8:40am

    While exploring deep into some dusty old library stacks, I dug up Nostradamus' long lost NoSQL codex. What are the chances? Strangely, it also gave the plot to the next Dan Brown novel, but I left that out for reasons of sanity. About NoSQL, here is what Nosty (his friends call him Nosty) predicted are the signs you may need a NoSQL database...

    Categories: Networking Feed

    Scaling Ambition at StackOverflow

    Mon, 02/15/2010 - 1:54pm
    Joel Spolsky and Jeff Atwood are raising VC money for StackOverflow. This is interesting for three reasons: 1) Joel has always seemed like a keep it small and grow organically type of guy, so this is a big step in a different direction. 2) It means they think there's a very big market in the Q&A space and they mean to capture as much as the market as possible. 3) Most importantly for this blog, Joel gives some good advice on when to stay fresh and local and when it's time to jump for the brass ring, scale up your ambition, and go for VC money. Please see Joel's blog post for the details, but here's when to go VC:
    Categories: Networking Feed

    The Amazing Collective Compute Power of the Ambient Cloud

    Mon, 02/15/2010 - 8:11am

    This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

    Earlier we talked about how a single botnet could harness more compute power than our largest super computers. Well, that's just the start of it. The amount of computer power available to the Ambient Cloud will be truly astounding.

    2 Billion Personal Computers
    Categories: Networking Feed

    Hot Scalability Links for February 12, 2010

    Fri, 02/12/2010 - 9:20am
    1. My Life With Hbase by Lars George. The hardscabble tale of Hbase's growth from infancy to maturity. A very good introduction and overview of Hbase.
    2. NoSQL Alternatives -- Common Principles and Patterns for Building Scalable Applications. Explore the common principles behind the major NOSQL alternatives and how they compared with traditional database approach in terms of consistency, transaction and query semantics. We will also explore how we can make the transition between the two models smoothers through the support of standard interfaces such as JPA.
    3. Moore’s Law: The Future of Cloud Computing from the Bottom Up. Will Intel's 48 mega core chip change the world or be just another Spruce Goose?
    4. Rent or Own: Amazon EC2 vs. Colocation Comparison for Hadoop Clusters. It's much cheaper to own when you have a large relatively fixed size cluster and can find really cheap labor to maintain it all.
    5. A cloud in a plug - brilliant. A tiny, low-power, low-cost home server and NAS device powered by Tonido software that allows you to access your apps, files, music and media from anywhere.
    6. Seeking A Database That Doesn't Suck by Pixy Misa. Quick recap of databases that suck - or at least, suck for my purposes - and some that I'm still investigating.
    Categories: Networking Feed

    ElasticSearch - Open Source, Distributed, RESTful Search Engine

    Wed, 02/10/2010 - 8:02am

    ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene. Its features include:

    • Distributed and Highly Available Search Engine.
      • Each index is fully sharded with a configurable number of shards.
      • Each shard can have zero or more replicas.
      • Read / Search operations performed on either replica shard.
    Categories: Networking Feed