Steve Jobs – RIP
Posted on | October 5, 2011 | No Comments
What Steve Jobs achieved in his lifetime is nothing short of amazing. Coming from a blue-collar family and being in the first generation to go to college resonates with me. Be passionate about what you do and never quit. Ideas to live, work and die by. Rest in peace Mr. Jobs.
It’s Open Beta Baby!
Posted on | June 9, 2011 | No Comments
Open Beta is in full swing and we are thrilled to see the growth in the community. If you haven’t already, please signup to kick the tires on rgbdaily.com. We really appreciate your feedback and ideas! Improvements are rolling out everyday.
Can publishers survive a 30% hair cut?
Posted on | February 16, 2011 | No Comments
Couchbase: Can I get an Amen!
Posted on | February 8, 2011 | 1 Comment
Amen
I’ll keep this really short. We use Membase in production in a couple of ways:
- as a traditonal memcached front end to MySQL to offload read load
- for light weight persistent counters
- blob store for static pages
We love the mature tools around the memcached interface and also the easy scalability of the Membase platform in general. The fact that in terms of data safety things just work is a big draw. Only one thing is really missing and that was the basic query support. After becoming pros at modeling time series data efficiently across several different NoSQL technologies we finally settled on Redis as the most promising. Mainly because simple scripts could be written against a fairly robust system of data structures to support the business. Unfortunately data safety is something the project is still struggling with and the two man team is just getting started on a cluster solution. Redis just wasn’t the right fit for us. Our working strategy is to port what we could into Membase as a KV store and use MySQL as a index/search server. Couchbase changes all that and brings the query capabilities we are looking for under one umbrella.
Here is a brief excerpt from the Couchbase guys that has us pretty excited:
In addition to the unrivaled performance, reliability and breadth of the Couchbase family, Couchbase will offer the most feature-rich NoSQL database available: the only document database with strict ACID transaction guarantees, multi-point triggers, user code execution across database nodes with scatter-gather support, indexing and query support, database views, real time map-reduce support, immediately consistent (CP) semantics within a datacenter or zone, and eventually consistent (AP) semantics between data centers or zones. The combination of capabilities Couchbase provides will enable the cost-effective development and deployment of applications previously unimaginable.
RGB is pretty much available for primetime running on MySQL EBS and it largly serves us well with Membase as simple Memcached frontend. This works for us because there is an upper limit on the total volume of data/content for the day that is driven by our content publishers. We have other projects in the works though that involve the mobile space and operate at Internet scale where there is no gatekeeper. For those projects Couchbase is a great fit; best in class features, scalability and persistence that just work. I believe in the one true elastic datastore and I’m glad that a proven company is finally stepping up to the plate to make it a reality.
Redis Persistence: Borrow vs Build
Posted on | February 3, 2011 | No Comments
Redis and SQLite: Why wait for good enough?
Posted on | February 3, 2011 | 11 Comments
Intro
I think the number one benefit of Redis was bringing data structures to the head of table in the battle for the one true datastore . That said it can be more than a little painful to watch it go through the normal growing pains associated with a promising project. At RGB, we looked at Redis to solve the activity stream like many before us. After getting this all going in EC2 there was some frustration around the uncertainty of persistence in the face of failures that might occur. We are a shop that already decided that the premium for AWS RDS was enough to remove most of the pain points associated with MySQL. We have fast moving data that grows against the stream of world news. When we learned that Redis VM wasn’t performing at scale in production environments it really spooked us. When antirez said he was looking down the barrel of implementing his own BTree (not even the best solution for modern storage backends) from scratch I started to get upset. Like angry upset. When the news started to float about the filesystem datastore (one file per key) I started to look for other solutions. Redis is great, however we don’t have the operational resources to baby sit any piece of our architecture. Redis works great until it hits the wall; that wall being over 80% of your machine’s RAM. A lot of different solutions are being looked at to remedy Redis, but based on our collective experience on all this stuff few will disagree that it’s early days. I think the best option is for Redis to adopt a pluggable persistence strategy and support an open leader. A couple of solid candidates for embedding are:
- Berkley DB
- SQLite
- InnoDB
What follows are ideas about how maybe we can go the other way and apply the well behaved deterministic data structure operations found in Redis to the databases that are industry proven and do the most important thing really well: read/write data really well.
What else has been done in the space?
In the comments I saw many rock star engineers try to talk antirez off the ledge. Justin Sheehy of Basho pretty much offered up bitcask to solve the persistence problem, but no bite. Don’t get me wrong, I love Redis and the future of Redis Cluster is really bright, but this “not made here” / “our community can wait for our two man team” can only go so far before the Ah Ha moment of Redis is co-opted by other players in the space. This brings me to my first major point. The idea of datastructures in the DB is not new:
- MySQL Stored Routines Library (AKA ”Arrays,Sets,Hashes,Queues,and Stacks” implemented efficiently in stored procedures )
- Written by Giuseppe Maxia, QA Director at Continuent, back in ’05
- Not as fast as Redis, but it does what it needs to do (support large datasets with limited memory while keeping your data safe)
- With some minor improvements things could be tuned for massive vertical scale (think RDS and big box environments)
- Even simple modulo sharding on the keys could allow the system to better scale on many cpu/core servers by minimizing IO locks. Wih this strategy the simple tables used to support the datasets are replicated to create partitions. Native MySQL hash partitions might also be a good bet, but just having a separate table means the opportunity to shard at the client level opens up. With client side key sharding we can now host partitions on different databases
- The libary supports array_merge and could be extended easily for diffs for full set support.
Enter SQLite
So should everyone just move back to MySQL? Hell no, of course not. It’s just proof that antirez has options about where to spend his time. Playing around in the mud with core peristence doesn’t save time. Primary key and single integer secondary key lookups are fast on any database. These types of ranged searches would be limited in nature to just collecting elements of a list. This is fast everywhere. What is slow are the arbitrarily complex index interactions that come with the relational world. Simple data structures make things faster regardless of the technology used in the implementation. So what alternatives do we have for persistence? Given the example above I think SQLite could be a great implementation leveraging the power of stored procedures to implement atomic datasets. Redis waste the RAM it has on core storage instead of using it effectively as a cache and working memory for arbitrarily large datasets (only limited by node resources). Redis should be only concerned with the client, network and caching layers and leave persistence to more mature and capable backends. Redis could make those backends more capable by orchestrating partitions, backups and bulk data moves once Redis Cluster comes online.
SQLite is really only one suggestion. Using BerkleyDB or Innostore could remove the slowness and pain of interacting with the data through SQL. I believe Membase is currently dealing with the pain of slow performance using SQLite once internal cache is exhausted. Indeed, something like Innostore or bitcask might be a better bet.
Alas, the benefits of Redis are clear and co-optable. For example, Basho’s Riak has a great node/link model for representing data. They added a Solr abstraction on top of this model to court users trying to solve the scalable search pain point. What really stops them from adopting a simple data structures interface for the majority of developers who learned how to think that way in their freshman year? 6.001 (Data Structures) was the first CS class I took at MIT.
HTML5: Facebook and Google
Posted on | January 25, 2011 | No Comments
This is largely a response to Facebook’s Focus In 2011: Better Cross-Platform Unification Led By HTML5
Both Facebook and Google have an equal interest in making sure the Internet is dominated by the web browser and not closed apps. Neither search or like buttons work in an Apple lead application dominant world. How Facebook approaches the iPad should be very telling; I believe Bret Taylor is signalling a HTML5 client that works on multiple tablets. The current iPhone client is very similar to the mobile site in appearance and functionality.
Facebook can only do so much as a walled garden. The amount of relationships people can effectively manage is around 150. The social graph is dominated by the photos and newsfeed of those core relationships. After the social graph is established things quickly move to the interest graph, which is dominated by content pulled in via the ‘Like’ button. Facebook needs the greater web for not only content, but also commerce.
For Google’s part, you can see quite a few examples of their plans for mobile HTML5 across their product suite. HTML5 is really the only way to future proof and streamline development, both Facebook and Google leverage small teams to get things done. When single developers are responsible for a whole client product it’s tough when that knowledge is not shared by others in the organization. For example, rockstar Joe Hewitt was responsible for Facebook’s iPhone app which stagnated for some time after he went on to Android and other projects. In fact Joe shares an interesting perspective on the the fate of the mobile web and what might happen if developers don’t collectively work to support it.
A Reasonable Architecture for Your Next Big Idea
Posted on | September 28, 2010 | No Comments
Over the weekend I presented at Podcamp Boston 5. My talk was on potential SaaS (software as a service) architectures which leverage PaaS (platform as a service) hosting options. It provided a couple alternatives for business founders who are looking to implement a Software as a Service solution. The context is when a technical founder is not present. The presentation covers the smart questions and options that usually are not explored on day one when technical resources are outsourced. The presentation starts out with the question, “Which Mark are You?”
Read more

