Skip to main content
Version: 0.8

Architecture

Skytable is a modern NoSQL database that prioritises performance, scalability and reliability while providing a rich and powerful querying interface. We are generally targetting an audience that wants to build high performance, large-scale, low latency applications, such as social networking services, auth services, adtech and such. Skytable is designed to work with both structured and semi-structured data.

Our goal is to provide you with a powerful and solid foundation for your application with no gimmicks — just a solid core. That's why, every component in Skytable has been engineered from the ground up, from scratch.

And all of that, without you having to be an expert, and with the least maintenance that you can expect.

Fundamental differences from relational systems

BlueQL kind of looks and feels like using SQL with a relational database but that doesn't make Skytable's internals the same, with the most important distinction being the fact that Skytable has a NoSQL engine! But Skytable's evaluation and execution of queries is fundamentally different from SQL counterparts and even NoSQL engines. Here are some key differences:

  • All DML queries are point queries and not range queries:
    • This means that they will either return atleast one row or error
    • If you intend to do a multi-row query, then it won't work unless you add ALL. ALL by itself indicates that you're applying (or selecting) a large range and can be inefficient
  • Multi-row DML queries are slow and inefficient and are discouraged
  • You can only query on the primary index, once again because of speed (and the problem with scaling multiple indexes) with a fixed set of operators.
  • Remember, in NoSQL systems we denormalize. Hence, no JOINs or foreign keys as with many other NoSQL systems
  • A different transactional model:
    • All DDL and DCL queries are ACID transactions
    • However, DML transactions are not ACID and instead are efficiently batched and are automatically made part of a batch
      transaction. The engine decides when it will execute them, for example based on the pressure on cache. That's because our focus is to maximize throughput
    • All these differences mean that DDL and DCL transactions are ACID transactions while DML queries are ACI and eventually D (we call it a delayed durability transaction). This delay however can be adjusted to make sure that the DML queries emulate ACID transactions but that defeats the point of the eventually durable system which aims to heavily increase throughput.
    • The idea of eventually durable transactions relies on the idea that hardware failure even though prominent is still rare, thanks to the extreme hard work that cloud vendors put into reliability engineering. If you plan to run on unreliable hardware, then the delay setting (reliability service) is what you need to change.
    • For extremely unreliable hardware on the other hand, we're working on a new storage driver rtsyncblock that is expected to be released in Q1'24
  • The transactional engine powering DDL and DCL queries might often choose to demote a transaction to a virtual transaction which makes sure that the transaction is obviously durable but not necessarily actually executed but is eventually executed. If the system crashes, the engine will still be able to actually execute the transaction (even if it crashed halfway)
tip

We believe you can now hopefully see how Skytable's workings are fundamentally different from traditional engines. And, we know that you might have a lot of questions! If you do, please reach out. We're here to help.

Data model

Just like SQL has DATABASEs, Skytable has SPACEs which are collections of what we call data containers like tables. In Skytable lingo, we don't call them tables but call them MODELs which enable you to define your data model.

info

While a MODEL is the only data container for now, many more are set to come. Now is a good time to join our discord server where you can directly chat with the developers working on Skytable and all our awesome community members.

A space is like a database

A space in Skytable is a collection of models and other objects, and settings. It is different from a traditional SQL Database (that is created with SQL's CREATE DATABASE) because it is not meant for tables only, but many other things.

Spaces have "space local" settings that can be set for the space (and will be respected by all its models and other items). You'll learn more about this in the section on DDL.

A model is like a table

A model in Skytable is like a table in SQL but is vastly different because of certain concepts such as:

  • DML queries are point and not range queries by default
  • Restrictions on indexes
  • Settings (which can't be created in traditional tables)
  • Semi-structured data: with collection types in columns such as lists and dictionaries that violates some of the ideas of traditional schema enforcement

Query language

Skytable has it's own query language BlueQLTM which takes a lot of inspiration from SQL but makes several different (and sometimes vastly different) design choices, focused on clarity, speed, simplicity and most importantly, security.

For example, Skytable's BlueQLTM only allows the parameterization of queries. All the queries you ran previously with strings and numbers directly were only possible because the REPL client smartly does the paramterization behind the scenes. This is done for security. You'll learn more about BlueQL next.

Transactions

Skytable's DDL and DCL queries are all ACID transactions. However, DML queries use an AOF-style storage driver that periodically records, analyses and then intelligently syncs the changes to disk. We're working on making ACID transactions widely available across DML queries as well.

Storage

Skytable's storage engine is collectively called the Skytable Disk Storage Subsystem or SDSS for short. The storage engine uses several different storage drivers, using ones appropriate for the task. We do not use RocksDB or any other engine but we implement everything in house, engineering them piece by piece.

Features on track

At this point, Skytable is primarily in-memory which means that while it uses disk storage for durability, most data is stored in-memory. This is going to change in the near future as the team is working on building a custom log-based engine. As you might understand, this is not an everyday task and as we incorporate new ideas it will take some time. But if you're seeing this in 2023, you can expect us to ship something by Q1 2024.

DDL and DCL transactions use a log-based append-only driver while DML queries use a custom log-based append-only driver that is able to intelligently handle concurrent changes. The team will implement new and updated storage drivers from time to time but you do not have to worry about anything, due to our promise for backwards compatibility (see below).

Scalability

Skytable is heavily multithreaded enabling incredible vertical scalability and you can witness it for yourself by running benchmarks (or looking at ones that we publish). Clustering and replication are on track to be released soon.

Features on track

Clustering and replication are right on track and we're expecting to deliver them by Q1 2024. We'd also like to note that clustering is too important to ignore so you can be assured that we're hard at work on it.

Skytable will use atleast as many threads as the number of logical CPUs present on the host. At this moment it is not possible to limit multithreading because multithreading is a part of Skytable's design principles and every attempt is made to exploit available CPU cores to the fullest.

Networking

Skytable its own in-house Skyhash protocol that is built on top of TCP enabling any programming language that has a TCP client to use it without issues. There are three phases in the connection:

  • Handshake: All auth data, compatibility information and other data is exchanged at this step
  • Connection mode selection: based on the handshake parameters a connection mode is chosen and the server responds with the chosen exchange mode
  • Data exchange: This is where the real querying happens
  • Termination: there is no special step; just a TCP FIN

Backwards compatibility

We make the promise to you that no matter what changes in Skytable, you will always be able to:

  • Upgrade from one version to another without data loss or too many hoops
  • Export data from Skytable at any time

More technically:

  • For minor/patch releases: The minor/patch is just in the name but it indicates that no data migration effort is needed. No minor releases ever need data migration, and any migration is done automatically
  • For major releases: Major releases generally introduce breaking changes (just like the upgrade from 0.7.x to 0.8.0 is a largely breaking change). Major releases will either automatically upgrade the data files or require you to use a migration tool that is shipped with the bundle.
  • Definitions (closely following semantic versioning):
    • A major release is something like 1.0.0 to 2.0.0 or 0.8.0 to 0.9.0 (in development versions, 0.8.0 to 0.9.0 is a major version bump)
    • A minor release is something like 1.0.0 to 1.1.0 or 0.8.0 to 0.8.1
    • A patch release is something like 1.0.0 to 1.0.1 or 0.8.0 to 0.8.1 (note that in development versions there is no distinction between a minor and patch release)