master
Sven Slootweg 3 years ago
parent 2802cd9964
commit 9bf76eb220

@ -0,0 +1,88 @@
# Architecture
__NOTE:__ This document is a work in progress, and will be changed and extended over time.
## Design philosophy
- Everything should be as stateless as possible.
- Everything should be as composable as possible, including in ways unforeseen by the Zap developers.
- Validation should be strict and informative; if the user does something wrong, they should be informed of this as quickly as possible, and in an actionable way.
- The user should not need to know anything about PostgreSQL whatsoever; as far as they are concerned, it's an internal implementation detail.
- However, a DBA should, if they so desire, be able to make sense of what Zap is doing. Likewise, interacting with the database from software in other languages should be reasonably painless. To this end, the schema and queries produced by Zap should reflect common practices in hand-written queries as much as possible.
## High-level architecture
The general lifetime cycle of a query is as follows:
1. User constructs AST in-place using operations. Crucially, there is no parsing step; the AST is constructed directly, but the functions used for this are designed like a DSL.
2. Optimizers normalize and improve upon this AST to generate a representation that matches SQL semantics.
3. The optimized AST is converted into an SQL query, series of parameters, optionally scheduled follow-up queries (eg. for relation handling), and metadata.
4. The query is executed and results collected.
5. Optionally, follow-up queries are executed to eg. resolve relations.
6. The combined results are returned to the user.
Importantly, steps 1-3 and 4-6 can be decoupled; this allows for "pre-compiling" queries for repeated use. Because all AST structures are fully stateless, queries can be reused indefinitely and with any kind of concurrency.
## Operations
The main mechanism for query building are "operations"; essentially just functions which take some input and return an AST node or subtree. Most of the query construction that the user does, will involve passing AST nodes returned from operations to other operation calls, essentially constructing a full AST in-place. The operation modules live in `src/operations/`.
Some example code using Zap operations might look like the following:
```js
// Get a thread with its (visible) posts
select("threads", [
first(),
where({ id: threadID }),
withRelations({
posts: has("posts.thread", [
where({ visible: true }),
startAt(offset),
first(10)
])
})
]);
```
Because functions represent declarative keywords rather than immediate actions, some operations are reused in multiple contexts, and will return different AST nodes depending on their input. By validating these specific types in inputs elsewhere, it can be ensured that only valid combinations of operations can be provided.
An example is the `index` operation, which will return a `localIndex` node when called as `index()`, but an `index` node when called as `index("fieldName")`. Schema field definitions then only accept `localIndex`, not `index` nodes; and likewise, schema index definitions only accept `index`, not `localIndex` nodes.
There are two types of operations; regular operations and internal operations:
- __Regular operations:__ These are exposed to the end user, and are meant to be used primarily for query-building. They are validated very strictly; they should only allow input that is guaranteed to be representable in the resulting SQL query in some sensible way (unless this cannot be statically ensured).
- __Internal operations:__ These are operations that are only meant to be used from optimizers. They are typically validated less strictly, but still do checks on their inputs. There are two main types:
1. Operations which represent some internal construct that will be used for SQL generation, but that the user will never specify themselves.
2. Operations which represent a *type* of operations (with subtypes) that *is* available to the user, but that would normally involve multiple different operation methods and where that would be inconvenient for optimizer authors. An example are the `moreThan`, `lessThan`, `equals`, etc. operations which are all represented by a single `condition` internal operation.
For validating inputs to operations, [Validatem](https://validatem.cryto.net/) is used. Most of the (partial) validation rules live in `src/validators/operations/`, as they are commonly reused across operations. However, the top-level validation rules are defined within the operation functions themselves, and therefore input is validated at AST construction time.
## Optimizers
Internally, Zap has an 'optimizer' infrastructure that allows for AST transformation, much like a tool like Babel might do for Javascript. Their purpose is to do any and all AST transformations necessary to translate a Zap AST into something that semantically represents an equivalent SQL query. The optimizers live in `src/optimizers/`.
Optimizers are currently all bundled into the core package with no support for third-party optimizers, but this will likely change in the future. Optimizers are split into multiple categories, which can be enabled/disabled as desired to target a specific set of tradeoffs.
Currently, the following categories of optimizers are defined:
- __normalization:__ These are optimizers that are required for Zap to function correctly. They will typically deduplicate AST nodes of which only one should logically exist, convert Zap-specific semantics into their equivalent SQL semantics, and so on. Disabling these will break Zap.
- __performance:__ These are optional optimizers that aim to improve the performance of the query in some way, eg. by reducing its complexity when certain patterns are found.
- __readability:__ These are optional optimizers that aim to improve the *human* readability of the generated queries. They can typically be safely disabled until there is a desire to debug Zap's behaviour on a PostgreSQL level, in which case they would make it easier to understand what Zap is doing. If there is no specific performance concern, it's usually a good idea to leave these enabled.
An optimizer is defined as a set of 'visitors', which specify a handler to call for each encountered node of the specified type. The handler can then decide to:
1. Return a new node. Mutating the existing node is __not__ permitted.
2. Return a `RemoveNode` marker: remove the node, all of its children, and any accumulated state (explained later). This will typically be used for extraneous nodes.
3. Return a `ConsumeNode` marker: consume the node and all of its children, but leave any accumulated state intact. This will typically be used for 'modifier' nodes which just serve to configure a parent node.
4. Return a `NoChange` marker, indicating that the node should remain unmodified.
5. Return a defer (explained below).
Every optimizer must eventually 'stabilize'; that is, it must return `NoChange` for all of its visitors, when it concludes that all of its work has been done. The optimizer infrastructure ensures this; if an optimizer fails to stabilize after a configured number of iterations (currently 10), the query optimization phase will be aborted. This design allows for iterative optimization of the AST, even if there's a (bounded) bi-directional interdependency between two optimizers.
The AST is traversed depth-first, starting at the root of the tree. This means that handlers for parent nodes are invoked before those of their child nodes.
Each handler is invoked with access to the node that is currently being processed, as well as a number of utilities:
- `setState` and `registerStateHandler`, for emitting state and capturing that state upstream respectively. Each state item is keyed by a 'type', and multiple items of the same type can be emitted. State items will propagate upwards to the nearest parent that has a handler registered for their type; but only if their originating subtree has not been removed in the meantime. This prevents stale state from being processed. State from consumed nodes *is* propagated; as is typically desirable for handling modifier nodes.
- `defer`, which allows for specifying a callback to be called later, *after* the node's child nodes have been processed. This is commonly used in combination with state handling to collect state from child nodes and, afterwards, construct and return a new node based on the collected information. The defer callback may return all of the same 'conclusions' as a handler callback, *except* for another defer.
- Some number of path utilities for determining the path of the currently-processed node and its ancestors. This part of the API is still in flux, but at the time of writing there only exists `findNearestStep`, which locates the nearest ancestor of a given type (using `"$object"` to denote an object literal and `"$array"` for an array).

@ -54,7 +54,7 @@ return Promise.try(() => {
return processSchemaUpdate(update);
});
// console.log(require("util").inspect(processed, { colors: true, depth: null }));
console.log(require("util").inspect(processed, { colors: true, depth: null }));
});
// TODO: Allow-gap option to permit filling in 'gapped migrations' (eg. after an earlier migration gets merged in via branch merge)

@ -116,7 +116,7 @@ select("threads", [
sortedBy(descending("last_post.created_at"))
]);
// Get a thread with all its posts
// Get a thread with its (visible) posts
select("threads", [
first(),

@ -0,0 +1,29 @@
"use strict";
const { validateOptions } = require("@validatem/core");
const required = require("@validatem/required");
const oneOf = require("@validatem/one-of");
const arrayOf = require("@validatem/array-of");
const node = require("../ast-node");
module.exports = function (operations) {
const isIndexClause = require("../validators/operations/is-index-clause")(operations);
const isField = require("../validators/operations/is-field")(operations);
return function _index(_options) {
let { indexType, field, clauses } = validateOptions(arguments, {
indexType: [ required, oneOf([ "index", "unique", "primaryKey" ]) ],
field: [ required, isField ],
clauses: [ arrayOf(isIndexClause) ]
});
return node({
type: "index",
indexType: indexType,
isComposite: false,
field: field,
clauses: clauses
});
};
};

@ -9,6 +9,7 @@ let internalOperations = Object.assign({}, operations);
evaluateCyclicalModulesOnto(internalOperations, {
_condition: require("./condition"),
_arrayOf: require("./array-of"),
_index: require("./index-operation")
});
module.exports = internalOperations;

@ -7,7 +7,8 @@ const node = require("../../ast-node");
module.exports = function makeIndexObject(fieldsResult, properties) {
if (fieldsResult.type === "local") {
return node({
type: "localIndex"
type: "localIndex",
... properties
});
} else {
let isComposite = matchValue(fieldsResult.type, {

@ -8,12 +8,12 @@ const makeIndexObject = require("./_make-index-object");
module.exports = function (operations) {
const isIndexFields = require("../../validators/operations/is-index-fields")(operations);
const isObjectType = require("../../validators/operations/is-object-type")(operations);
const isIndexClause = require("../../validators/operations/is-index-clause")(operations);
return function index(_fields, _clauses) {
let [ fields, clauses ] = validateArguments(arguments, {
fields: isIndexFields,
clauses: [ defaultTo([]), arrayOf(isObjectType("where")) ]
clauses: [ defaultTo([]), arrayOf(isIndexClause) ]
});
return makeIndexObject(fields, {

@ -8,12 +8,12 @@ const makeIndexObject = require("./_make-index-object");
module.exports = function (operations) {
const isIndexFields = require("../../validators/operations/is-index-fields")(operations);
const isObjectType = require("../../validators/operations/is-object-type")(operations);
const isIndexClause = require("../../validators/operations/is-index-clause")(operations);
return function unique(_fields, _clauses) {
let [ fields, clauses ] = validateArguments(arguments, {
fields: isIndexFields,
clauses: [ defaultTo([]), arrayOf(isObjectType("where")) ]
clauses: [ defaultTo([]), arrayOf(isIndexClause) ]
});
return makeIndexObject(fields, {

@ -1,9 +1,9 @@
"use strict";
const splitFilter = require("split-filter");
const matchValue = require("match-value");
const operations = require("../../operations");
const internalOperations = require("../../internal-operations");
const NoChange = require("../util/no-change");
const ConsumeNode = require("../util/consume-node");
@ -12,6 +12,8 @@ Translate index modifiers on a single field into a top-level (non-composite) ind
{ type: "index"|"removeIndex", indexType, isComposite: true|false, field|fields}
*/
// MARKER: Filter out ConsumeNode from result (doublecheck that their presence doesn't signify a bug!), figure out autogeneration for index names, improve error for localColumn to explicitly say "don't provide a column name" (via partial match followed by error, eg. forbidden + wrapError)
function handleCollection(node, { registerStateHandler, defer }) {
let createNode = matchValue.literal(node.type, {
createCollectionCommand: operations.createCollection,
@ -28,26 +30,22 @@ function handleCollection(node, { registerStateHandler, defer }) {
return defer(() => {
if (indexNodes.length > 0) {
let indexesObject = operations.indexes(indexNodes.map((item) => {
console.log(item); // node, property
let indexesObject = operations.indexes(indexNodes.map(({ node, key }) => {
// FIXME: Why is it allowed to call operations.index with the same arguments? That should be disallowed.
return internalOperations._index({
indexType: node.indexType,
field: operations.field(key),
clauses: node.clauses
});
}));
return NoChange;
return createNode(name, operations.concat([ indexesObject ]));
return createNode(node.name, node.operations.concat([ indexesObject ]));
} else {
return NoChange;
}
});
}
/*
[ createCollection, operations ]
[ _array, 0 ]
[ schemaFields, fields ]
[ _object, last_activity ]
*/
module.exports = {
name: "move-out-indexes",
category: [ "normalization" ],

@ -0,0 +1,7 @@
"use strict";
module.exports = function (operations) {
const isObjectType = require("./is-object-type")(operations);
return isObjectType("where");
};
Loading…
Cancel
Save