Beyond SQL: 8 new languages for data querying

in #sql2 years ago (edited)

SQL has dominated data querying for decades. Newer query languages offer more elegance, simplicity, and flexibility for modern use cases.

GraphQL
The name of GraphQL is a bit confusing because it’s not really a language designed to exploit all of the possibilities from graph databases. It’s more like an elegant shorthand for querying data that’s stored in nested format similar to JSON. The query is just a quick description of what the results should look like. The back end looks at this list of fields, which can come with restrictions on the values, and tries to find the results that match. Where SQL specifies how the database should complete a request, GraphQL users simply provide a list of fields. Some call it “query by example.”

The language is a natural match for some JSON databases, but GraphQL is also growing more popular for searching relational databases with a tabular schema. Smart back ends can translate the nested requests into a pattern of JOINs that fits the schema.

The original language began as an internal project at Facebook, but after it was released as an independent open source project, others started developing GraphQL back ends. There are now versions written in all of the major languages and many modern experimental languages, too.

PRQL
If you naturally think of software as a pipeline or an assembly language, then you might like PRQL, which stands for Pipelined Relational Query Language (pronounced "Prequel"). Queries in this language are structured as a chain of small commands. Taken together, the commands produce a result with just the data you want.

Similar to many modern programming languages, the mental model of a query takes a functional approach. Simple features like variables reduce repetition and simplify the flow. The results from one line are fed into the next line in a long chain. If you want to remove one step, you can often just comment out that line and the rest of the pipeline will still work.

PRQL's code is written in Rust as a transpiler for converting PRQL into SQL. The basic structure is meant to be extensible, so you can add more abstractions to suit your use case. This ease of experimentation ensures that the language will rapidly evolve.

WebAssembly
Many developers think of WebAssembly (abbreviated Wasm) as a tool for creating fast applications that run in web browsers. When Redpanda began building a data streaming tool to supersede Kafka, they wanted to add a mechanism for not only delivering the data but occasionally transforming it along the way. WebAssembly was their choice.

Redpanda acts as a ledger by creating a data stream that’s immutable and ordered. Events are appended and programmers can tap into the stream at any point in the past. Most will start with newly created events, but some may start in the past to create historical aggregations.

WebAssembly, of course, is much more capable and low level than even the stored procedures that are part of some databases. Not all developers want to write bit-banging, byte-level code. But the option opens up the data stream to elaborate transformations that go well beyond what is possible with SQL.

GQL
Graph Query Language, or GQL, is a proposed standard that merges similar declarative languages like Cypher, PGQL, and GSQL. Developers create queries by specifying a particular model for a set of nodes, and then the database is responsible for finding matches. GQL works with more complex property graphs that allow pairs of nodes to share several different connections.

The standard is under active development. Currently, the best implementations are research tools that aren’t intended for long-term deployment.

Gremlin
One of the original languages for searching a graph, Gremlin asks for a set of steps for searching through the connections between nodes. Some call it a “path-based” or “graph traversal” language. Each query is built on steps, and each step can involve either mapping the current node, filtering a list, or somehow tabulating the result.

The language is often just a starting point. Some, for instance, are expanding Gremlin by embedding a Python interpreter inside it so that queries can include Python code. Others are embedding Gremlin inside a standard programming language like Java so that programmers can tap the power of Gremlin from inside that language.

Gremlin was first built for Apache’s TinkerPop project and it’s been adopted by major transactional graph databases like Amazon's Neptune and graph processing frameworks using Apache Spark or Hadoop.

N1QL
Over the years, Couchbase has searched for the best way to query general documents. In the beginning, the query was written as a JavaScript function that was handed to the database for execution. It was a nice, general solution that sometimes took forever to generate a result, but it required programmers to think a bit differently.

N1QL (pronounced “nickel”) is designed to make it easier for SQL natives to work with the JSON objects that might be stored in Couchbase. A basic query has several sections specified by the keywords SELECT, FROM, and WHERE, just like SQL. The details of specifying the path into the data structure from which the data will come is adjusted and adapted to the nested world of JSON objects.

To encourage experimentation, N1QL offers a querying workbench with a visual interface for testing and refining queries. Couchbase also offers a generic full-text search option that runs independently for queries seeking text words instead of structured data.

Malloy
The problem with SQL, according to the creators of Malloy, lies in the syntactic details. Expressing even the simplest query takes time because the language is verbose and full of hidden performance traps. So, they created a modern programming language with natural defaults and simpler syntax that can be compiled to SQL, so no one needs to retrofit a stock database.

The result is a syntax that resembles a more powerful GraphQL. A query is more like a model or vision for the result, including any restrictions, matches, or defaults. Malloy handles some optimization in the background. Smarter JOINs, for instance, can be generated automatically to avoid chasm and fan performance traps. Subqueries can be aggregated to save time. Indices are also added as needed. As a result, writing queries feels more like writing modern code, with punctuation serving to keep the structure succinct.

Malloy's open source core is built out in TypeScript for including in Node.js code. A VS Code plugin simplifies development.

Basis
Most query languages are tied directly to a particular database. Basis is building more of a pipeline that can draw from a variety of sources before filtering them with a mixture of SQL and Python. At the end of the pipeline are exporters that deliver the data to a variety of standard options, which range from running code to AI algorithms to charts and dashboards.

Developers are already building pipelines like this in their own code, and many projects depend on similar structures. Basis offers a prebuilt option that can be customized in more elaborate ways. Inputs range from standard database queries to API taps to custom Python code. The transformers aren’t limited to SQL's basic WHERE clauses because you can write Python code that does more than filter as the data flows down the pipeline.

Basis is just one example of newer data pipeline tools that are opening up the querying process to draw from more than one source, filter with more than one language, and deliver the data in more than one form. It’s a vision of being able to take data from almost any source and deliver it to almost any consumer.

Source:- infoworld.com

Sort:  

.