cft

Couchbase N1QL Transaction: An Elastic and Scalable Distributed Transaction

In this article, you can find an explanation of transactions in Couchbase N1QL, a language used to manipulate JSON, as well as some examples.


user

Keshav Murthy

2 years ago | 6 min read

In this article, you can find an explanation of transactions in Couchbase N1QL, a language used to manipulate JSON, as well as some examples.

SQL is the only 22nd-century language available for developers today.

Abstract

In relational database systems, SQL is more than a declarative query language. It includes procedural language (T-SQL, PL/SQL, etc) and defines transactions and their semantics. SQL as a query language has been unreasonably effective even in NoSQL database systems.

However, few NoSQL database systems support transactions. The ones that support come with a long list of limitations and/or were unable to support SQL operations within the transaction.  

We introduce and explain transactions in Couchbase N1QL: SQL for JSON.  N1QL transactions are multi-everything: multi-document, multi-bucket, multi-scope, multi-collection, and multi-DML-statement.

N1QL Transactions is now generally available with Couchbase 7.0. You can download it here. See the documentation here.

Introduction

N1QL is a declarative language to manipulate JSON. Couchbase stores all the documents in the data service.  The query service orchestrates the query execution optimizing the query, creating an execution plan, and then executing it using data, indexing, and FTS.  The Couchbase SDK and query interaction protocol are built via REST over HTTP/S. N1QL DML statements include SELECTINSERTUPDATEUPSERTDELETE, and MERGE.  

N1QL Transactions

Here’s an example of transactions in RDBMS and Couchbase N1QL:

Transactions

MySQL Database (Statements are same/similar in Oracle, SQL Server, Informix & DB2)

Couchbase Database (7.0)

Insert data. Tuples in MySQL, JSON documents in Couchbase

INSERT INTO customer(cid, name, balance) VALUES(4872, “John Doe”, 724.23);

INSERT INTO customer(cid, name, balance) VALUES(1924, “Bob Stanton”, 2735.48);

INSERT INTO customerVALUES(“cx4872”, {“cid”: 4872, “name”:”John Doe”, “balance”:724.23});

INSERT INTO customerVALUES(“cx1924”, {“cid”: 1924, “name”:”Bob Stanton”, “balance”:2735.48});

Simple transaction, debit, and credit. Intermediate selects have to be read their own updates (RYOW)

START TRANSACTION;

UPDATE customer SET balance = balance + 100 WHERE cid = 4872;

SELECT cid, name, balance  from customer;

UPDATE customer SET balance = balance – 100 WHERE cid = 1924;

SELECT cid, name, balance from customer;

COMMIT ;

START TRANSACTION;

UPDATE customer SET balance = balance + 100 WHERE cid = 4872;

SELECT cid, name, balance  from customer;

UPDATE customer SET balance = balance – 100 WHERE cid = 1924;

SELECT cid, name, balance from the customer;

COMMIT ;

The second transaction with partial rollback.

START TRANSACTION;

UPDATE customer SET balance = balance + 100 WHERE cid = 4872;

SELECT cid, name, balance  from customer;

SAVEPOINT s1;

UPDATE customer SET balance = balance – 100 WHERE cid = 1924;

SELECT cid, name, balance from customer;

ROLLBACK WORK TO SAVEPOINT s1;

SELECT cid, name, balance from customer;

COMMIT ;

START TRANSACTION;

UPDATE customer SET balance = balance + 100 WHERE cid = 4872;

SELECT cid, name, balance  from customer;

SAVEPOINT s1;

UPDATE customer SET balance = balance – 100 WHERE cid = 1924;

SELECT cid, name, balance from customer;

ROLLBACK WORK TO SAVEPOINT s1;

SELECT cid, name, balance from customer;

COMMIT ;

If you didn’t see much difference, that’s because there isn’t. 

N1QL Transactions Statements 

N1QL transactions are a set of transactions that include any of the DML statements in all forms: no restrictions. Transactional protection is issued from new statements: BEGIN/START, COMMIT, ROLLBACK, and SAVEPOINT.

Start Transaction (Same as the Begin Work Statement)

This statement starts a new transaction, assigns a new transaction ID, and returns the transaction ID to the caller. There are two rules the SDKs, tools (e.g. CBQ shell) follow to successfully execute the rest of the transaction.

  1. Send this transaction ID as a parameter for every subsequent statement within the transaction. This is how the query service knows the statement should be run as part of a particular transaction.
  2. Couchbase can have multiple query service nodes, but a single transaction is executed on a single query node. You can start a new transaction in ANY QUERY NODE. However, the rest of the statements FOR THAT SINGLE TRANSACTION should be sent to the VERY SAME query node.

Commit Transaction or Commit Work

This commits all the changes in the transaction to the data store. This is a distributed commit of the transaction into the Couchbase key-value data store. The commit supports all of the Couchbase durability options.

This is still a distributed system — just like astronomy, rare things happen often. On any failure, the complete transaction is rolled back automatically and the application needs to retry the transaction.

Failures can occur for various reasons: network failure, node failure, node overloaded, and write-write conflict.  

Just like direct Couchbase WRITEs are optimistic and failures can occur due to concurrent writes resulting in CAS conflicts, N1QL transactions can also fail due to write conflicts. We implement a form of optimistic concurrency approach to transactions.

Rollback Transaction or Rollback Work

On an application-issued rollback, all of the modifications done within the transaction are rolled back.

As you’ve seen from the examples above, N1QL also supports savepoints and rollbacks to the savepoints within the transaction.  From the application perspective, these work the same as the RDBMS counterparts.

Transactional Features 

The transaction is more than just the statements — it’s all about the semantics and guarantees.  Hence the ACID definition. We talked about atomicity earlier wrt to COMMIT.  Let’s talk a bit more about this.

  • ATOMICITY: This is required for both the whole transaction and each statement. The DML statements will atomically roll back on any failure, but the transaction itself is open and can be continued. An example of a failure is a document key conflict on insert.
  • CONSISTENCY: This ensures the constraints are applied consistently for each statement. The only constraint in Couchbase is the unique constraint on the document key.  N1QL checks for the pre-existence of each of the keys inserted and rolls back the statement on any conflict.  Remember we use optimistic concurrency control.  That means, even after the INSERT is successful,  the commit stage can still run into a write-write conflict because some other session could have been inserted between the insert and commit. You’ll have to retry the transaction on such failures.
  • ISOLATION: We support the COMMITTED READ isolation level. All the data that’s read and evaluated is committed data in the index and data store. By default, we use the stringent request_plus consistency on the index reads.  This means, for a given predicate, we use the latest data in the index to qualify the documents to qualify select/update/delete. We then go the extra step to fetch the documents from the KV store and re-apply the predicates to ensure the latest committed version of the document is qualified and updated.

If performance wasn’t a consideration, everyone would have used serializable transactions :-). You can change the scan consistency to unbounded for improved index scan performance.

Durability

N1QL supports all of the durability options and features with the Couchbase data store to ensure durability on our distributed database.

Concurrency

Broadly speaking, database transactions use either pessimistic or optimistic concurrency control. Traditional single-node databases follow the pessimistic concurrency control to avoid conflicts.  

This approach is also applied to some of the multi-node implementations like Oracle RAC, DB2 Sysplex. Multi-node implementations are possible but require expensive Infiniband, custom hardware, etc.

Optimistic concurrency control version each base unit of the tuple (rows in RDBMS, documents in Couchbase), remember the version they read to modify, and check if the version has changed during the write.  

If there’s indeed a conflict, the whole transaction has to be retried. The advantage of this approach is, in a well-designed application, there should be few conflicts: you won’t withdraw cash and transfer money between accounts in the same nanosecond.  On the rare occasion you do, the retry is tolerated.

Concurrency in the N1QL Query Service

Couchbase N1QL uses optimistic concurrency control. Each transaction reads the documents it needs to update, updates them, and keeps the updated documents in its private per-transaction cache.

When you issue a subsequent statement, the query service is aware of the updated documents within the transactions and uses that version instead of the older version reflected in the index/data.  This is how it provides READ-YOUR-OWN-WRITE support.  

This is modeled so all the DML statements, all operations (select, join, project, aggregate, nest, unnesst, etc) will all get this RYOW benefit providing a consistent and crucial feature of the transaction.

Even while the application is doing transactions (doing both reads and writes), within the transaction we’re reading and caching the updates until the commit time.

This is the READ phase of the transaction, And because of this approach, there’s no coordination between multiple transactions or multiple query nodes within a transaction until commit (WRITE phase).

This ensures the performance and scalability of distributed transactions in Couchbase. And, before you ask,  all these work concurrently with Couchbase distributed transactions we released in 6.5.

Coordination is the bane of scalable systems — Peter Bailis

Next Steps

This is a short overview of what’s coming up with Couchbase Transactions.  In the upcoming series of articles, we’ll dive further into implementation, usage, SDK support, Lambda Transactions, Spring support, etc, and more.

Acknowledgments

N1QL transactions are the result of intense work and collaboration in Couchbase query, SDK, and QE teams to design and implement. Thank you!

Originally Published here!

Upvote


user
Created by

Keshav Murthy

Keshav Murthy is a vice president of Couchbase R&D. Previously at MapR, IBM, Informix, Illustra, Sybase with more than 20 years experience in the database server development. He lead the SQL and NoSQL R&D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India, holds eleven US patents, three patent applications pending and has numerous publications.


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles