About this document
Copyright (c) 2020 Sayan <[email protected]>
Date: Aug 23, 2020
In effect since: v0.4.0
Date: 6th Aug, 2020
Copyright © 2020 Sayan Nandan
Terrapipe is an application layer protocol, like HTTP, built on top of TCP which is used by Skytable for client/server communication. All clients willing to communicate with Skytable must implement this protocol. This document serves as a guide for implementing clients for Skytable.
Terrapipe also works in a request/response action like HTTP, except that we call it a query/response action. A client sends a query, and the server returns a response. The bytes sent by a client is called a query packet, while the bytes sent by the server in response to this is called a response packet.
Each packet (irrespective of query or response), contains two parts:
- The metaframe: the first two lines
- The dataframe: the remaining lines
- Simple queries: These queries just perform one action (like
- Batch queries: These queries perform multiple actions
A datagroup is a collection of elements that perform one action, for a query or a collection elements that return the outcome for one action, for a response. Don't worry, read on and you'll understand what it is.
The metaframe is made up of precisely two lines (separated by a LF or
\n byte). This is what it looks like:
<n>is the number of datagroups in the query. Since simple queries perform one action only, the value of
<n>is always 1. Batch queries have values of n in the range
(1, ∞)depending on the number of actions they perform
<m>is the number of bytes in the following line excluding the
\nbyte. So let's say we had
*1\nin the second line of the metaframe, then
<m>would have a value of 2.
Also make sure that you make matching the
* character non-exhaustive, as we might be adding more packet types in the future which may need symbols other than
The dataframe is made up of groups called datagroups as we had mentioned earlier. In the case of a query like
GET x ,
GET x will be the query datagroup, and the response returned by the server with the key, will be the response datagroup (of course, excluding the metaframe).
Every query datagroup will contain all the arguments required to execute that action, including the action itself. Each datagroup has two lines of boilerplate in the beginning. This looks like:
<q>is the number of arguments/elements in the datagroup and
<p>is the number of bytes in the next line, excluding the
\nbyte (just like the metaframe)
An entire datagroup, has the following structure:
We know the first two lines...but wait, what is
<--- DATA ---> ? Let's take a look at what is data looks like in the case of queries and then we'll look at what the data looks like in the case of responses.
For a request like
GET foo , the corresponding data will look like:
So, starting from line 1:
- we have a '#' followed by the number of bytes in the next line (3 in this case), excluding the
- We then have a
\nbyte. Looking ahead, we have 3 bytes (
GET) followed by a
- We then have a '#' followed by the number of bytes in the next line (3 again, in this case), followed by a
- And this goes on
The idea is, we have a
# followed by the number of bytes (let it be
n ) to be read from the next line and a line feed character ('\n'). In the next line, we read
n bytes and skip the line feed character. And we keep going on, in this way.
Not so hard, right? Now let's look at what the data looks like in the case of responses.
Data in responses have the following general structure:
Here, we can easily make out that
<length> is the number of bytes in the following line (just like a query) and
<item> is well, an argument/element in the datagroup. But then, what is
<tsymbol> ? Simply put, it is the type of data in the following line.
But, before you go ahead, here's the simple flow that you must keep in mind:
- Read the
tsymboland match it against the data type
- Parse the data in the next line into the data type
- Repeat for the remaining lines
So, to conclude,
<tsymbol> specifies the type of the
item in the following line. Read on to know what values
<tsymbol> can assume.
You can find a full list of data types and their
<tsymbol> s here.
You can find a full list of response codes and their descriptions here
The line that has a
# followed by the number of bytes in the next line or the line that has a
followed by the number of bytes in the next line (in the case of responses) is often referred to as a sizeline.
Let's say we're going to run
GET foo to get a key called 'foo'. Since we're just running one action, this
tells us that this is a simple query.
Since simple queries just do one thing, they'll just have one data group. So the metaframe is fairly simple:
Now, what about the dataframe? Well, there is a single datagroup for the
GET action with two arguments.
Hence, the boilerplate for the dataframe will look like:
Now, what about the data?
foo are the two arguments, so the data in the dataframe will look like:
GET foo will produce the following query packet:
Pretty simple, right?
We'll assume that a key
foo exists, and it holds the value
bar. Since we aren't going to be building
responses, but rather parsing them — let's understand the parts of a response that will be returned by the server in this case:
Like you know, the metaframe is always made up of two lines (i.e two parts separated by a LF character or
So, in our case, the response's metaframe is:
We start our parsing with the first line. Let's read in the sizeline. For this, we ignore the LF char and look
at the remaining parts. What is it that we have? It's
For more information on sizelines, read this note
This means that our next line will have 2 chars excluding the LF. Good!
Now we read the first 2 chars as that's the only important bit for us and ignore the linefeed. This leaves us
*1. Since we're in the metaframe, it's easy to understand that
*1 is telling us that the response
packet is one for a simple query and NOT a pipelined query.
So, as of now, we know that this reponse corresponds to a simple query.
In our case, the dataframe will be the remaining part:
We know that we need to read the boilerplate for the dataframe that is again two lines or parts separated by LF chars. So, we read in the sizeline.
To read the sizeline, we just read upto the
\n character and then ignore the
LF. This leaves us with
Again, just like queries, this means that the next line will have 2 chars excluding the LF. Great!
So, we read the first 2 chars and ignore the linefeed. In our case,
&1. This tells us that there is going to be a datagroup with one element.
In our case, the data is made of the remaining part:
So, we read the sizeline again. This sizeline begins with a
+ and not
#. This is obvious because sizelines
<tsymbol>s in the case of the data in response dataframes.
So, we have
+3 (ignoring the LF). From the data type list, we know that
+ indicates a string
in the following line. Good! And the
3 tells us that the next line is 3 chars long, excluding the linefeed!
So, in other words, we have a 3 character long string in the next line. Now let's read in 3 characters and ignore the linefeed. Hey, we have a 3 char string 'bar'.
So, our response becomes (in pseudocode data-structure):
If we ran
MGET x y z, and x and y existed with values
why and z didn't exist, we'd have three
elements in the datagroup:
Phew, we're done!