``:posts of every follower.
Note that we also maintain a timeline with all the posts. In order
to do so what is needed is just to LPUSH the post against
global:timeline. Let's face it, do you start thinking it was a bit
strange to have to sort things added in chronological order using
ORDER BY with SQL? I think so indeed.
Paginating updates
------------------
Now it should be pretty clear how we can user LRANGE in order to
get ranges of posts, and render this posts on the screen. The code
is simple:
::
function showPost($id) {
$r = redisLink();
$postdata = $r->get("post:$id");
if (!$postdata) return false;
$aux = explode("|",$postdata);
$id = $aux[0];
$time = $aux[1];
$username = $r->get("uid:$id:username");
$post = join(array_splice($aux,2,count($aux)-2),"|");
$elapsed = strElapsed($time);
$userlink = "".utf8entities($username)."";
echo(''.$userlink.' '.utf8entities($post)."
");
echo('posted '.$elapsed.' ago via web
');
return true;
}
function showUserPosts($userid,$start,$count) {
$r = redisLink();
$key = ($userid == -1) ? "global:timeline" : "uid:$userid:posts";
$posts = $r->lrange($key,$start,$start+$count);
$c = 0;
foreach($posts as $p) {
if (showPost($p)) $c++;
if ($c == $count) break;
}
return count($posts) == $count+1;
}
``showPost`` will simply convert and print a Post in HTML while
``showUserPosts`` get range of posts passing them to ``showPosts``.
Following users
---------------
If user id 1000 (antirez) wants to follow user id 1001 (pippo), we
can do this with just two SADD:
::
SADD uid:1000:following 1001
SADD uid:1001:followers 1000
Note the same pattern again and again, in theory with a relational
database the list of following and followers is a single table with
fields like ``following_id`` and ``follower_id``. With queries you
can extract the followers or following of every user. With a
key-value DB that's a bit different as we need to set both the
``1000 is following 1001`` and ``1001 is followed by 1000``
relations. This is the price to pay, but on the other side
accessing the data is simpler and ultra-fast. And having this
things as separated sets allows us to do interesting stuff, for
example using SINTER we can have the intersection of 'following' of
two different users, so we may add a feature to our Twitter clone
so that it is able to say you at warp speed, when you visit
somebody' else profile, "you and foobar have 34 followers in
common" and things like that.
You can find the code that sets or removes a following/follower
relation at follow.php. It is trivial as you can see.
Making it horizontally scalable
===============================
Gentle reader, if you reached this point you are already an hero,
thank you. Before to talk about scaling horizontally it is worth to
check the performances on a single server. Retwis is
**amazingly fast**, without any kind of cache. On a very slow and
loaded server, apache benchmark with 100 parallel clients issuing
100000 requests measured the average pageview to take 5
milliseconds. This means you can serve millions of users every day
with just a single Linux box, and this one was monkey asses slow!
Go figure with more recent hardware.
So, first of all, probably you will not need more than one server
for a lot of applications, even when you have a lot of users. But
let's assume we **are** Twitter and need to handle a huge amount of
traffic. What to do?
Hashing the key
~~~~~~~~~~~~~~~
The first thing to do is to hash the key and issue the request on
different servers based on the key hash. There are a lot of well
known algorithms to do so, for example check the Redis Ruby library
client that implements *consistent hashing*, but the general idea
is that you can turn your key into a number, and than take the
reminder of the division of this number by the number of servers
you have:
::
server_id = crc32(key) % number_of_servers
This has a lot of problems since if you add one server you need to
move too much keys and so on, but this is the general idea even if
you use a better hashing scheme like consistent hashing.
Ok, are key accesses distributed among the key space? Well, all the
user data will be partitioned among different servers. There are no
inter-keys operations used (like SINTER, otherwise you need to care
that things you want to intersect will end in the same server.
**This is why Redis unlike memcached does not force a specific hashing scheme, it's application specific**).
Btw there are keys that are accessed more frequently.
Special keys
~~~~~~~~~~~~
For example every time we post a new message, we **need** to
increment the ``global:nextPostId`` key. How to fix this problem? A
Single server will get a lot if increments. The simplest way to
handle this is to have a dedicated server just for increments. This
is probably an overkill btw unless you have really a lot of
traffic. There is another trick. The ID does not really need to be
an incremental number, but just **it needs to be unique**. So you
can get a random string long enough to be unlikely (almost
impossible, if it's md5-size) to collide, and you are done. We
successfully eliminated our main problem to make it really
horizontally scalable!
There is another one: global:timeline. There is no fix for this, if
you need to take something in order you can split among different
servers and **then merge** when you need to get the data back, or
take it ordered and use a single key. Again if you really have so
much posts per second, you can use a single server just for this.
Remember that with commodity hardware Redis is able to handle
100000 writes for second, that's enough even for Twitter, I guess.
Please feel free to use the comments below for questions and
feedbacks.
.. |Redis Documentation| image:: redis.png