Initial commit

master
Sven Slootweg 2 years ago
commit aed19b8171

@ -0,0 +1,3 @@
{
"extends": "@joepie91/eslint-config"
}

3
.gitignore vendored

@ -0,0 +1,3 @@
node_modules
local_cluster
junk

@ -0,0 +1,55 @@
# Date
bitmask 7 bits
year 8192 13 bits
month 12 16 4 bits
day 31 32 5 bits
hour 24 32 5 bits
minute 60 64 6 bits
second 61 64 6 bits
millisecond 1000 1024 10 bits
-----------
56 bits = 7 bytes
# Duration
bitmask 7 bits
sign 1 1 1 bit
years 4096 12 bits
months 12 16 4 bits
days 31 32 5 bits
hours 24 32 5 bits
minutes 60 64 6 bits
seconds 61 64 6 bits
milliseconds 1000 1024 10 bits
-----------
56 bits = 7 bytes
NOTE:
- It must be possible to evaluate schema migrations statelessly; that is, without zero knowledge of what data *currently* exists in the database, a sequence of migrations up to any point must *always* result in a valid schema that:
- does not allow for data to exist in the database which violates the schema's constraints (eg. missing required fields)
- allows full rollback to any previous point in history, with data loss only being permitted in that process if it is fundamentally unavoidable due to the nature of the migration (eg. rolling back the addition of a new field)
- for *any* sequence of migrate-to and rollback operations within the same set of linear migrations, continues to uphold the above two properties
- Make sure that a colum default can be specified separately for new vs. migrated rows - in some cases, the user may want to initialize existing rows with a value derived from that row (eg. to emulate application insertion logic) rather than with the usual column default.
- If both a regular and migration default is specified: use either for its relevant purpose
- If only a migration default is specified: use that for migration, and disallow NULL values in new records
- If only a regular default is specified: use that for both cases
- If neither is specified: this is an error in a changeFields, but allowed in an addFields, and just means NULL values in new records are disallowed
- For the regular default, default functions *do not* receive the previous value; if the user wants to use this, they should specify a migration default
- A migration default only applies for *that specific migration step*, not for any migrations after it, even if the same field is affected. This needs to be specifically ensured to avoid bugs.
- When applying arithmetic directly to integer-encoded decimal numbers, magnitude scaling may be needed; for example:
1.1 * 1.2 = 1.32 (= 1.32 in decimal representation)
11 * 12 = 132 (= 13.2 in decimal representation, as original numbers were scaled by 10x, but this is WRONG)
11 * 12 (/ 10) = 13.2, rounded to 13 (1.3 in decimal representation, CORRECT, even though some precision is lost to conform to the storage precision)
- For user-specified reversal operations in migrations, automatically do a test with some random values to detect errors?
- Make sure to version the DSL import; so that old migrations can continue using older versions of the DSL! At least until there is some kind of codemod mechanism for this.
- Should be some way to 'inherit' an instance from the base database connection, allowing for configuring things like type adapters - this would let the user choose whether to eg. define custom type adapters globally or only for a specific table or such. Need to figure out how this fits into the DSL design where queries are stateless by default. Maybe a custom filter hook that lets the user semi-declaratively specify what queries to apply custom adapters to, or so?
- unsafeForbidRollback must make rollbacks impossible even in hot reload mode; although in *some* cases there might be a default value that could be reset to, it is possible for fields to exist that absolutely require an application-provided value. Therefore, it is not consistently possible to rollback even in a controllably-unsafe manner, when no rollback operation is specified.
Query planning:
- Make list of all 'queried fields', ie. fields which are used to filter or order
- If the first sorting criterium is also a filtering field *and* there is an index for that field, it should be selected as the first index to select from, because then we can implicitly use the order from the index
- Otherwise: apply filters, and if the remaining results set is more than __% of the full collection, and the sorting criterium has an index, reorder the resultset according to that index; if not, do a regular sort on the retrieved-and-decoded record data instead
- Any descending sorts should come *before* any record-fetching filters/criteria, so that it doesn't have to reverse a full result set in memory
- Sorting criteria should be internally rearranged as-needed, to prefer sorting by indexed fields with high cardinality (ie. many different values) first and low cardinality last
- Possible optimization: if the filtered subset appears to comprise most of the table, do a sequential filtering scan of the table instead of retrieving each matched item individually? This might be more efficient for some backends. Maybe backends should be able to configure whether this is the case for them?

@ -0,0 +1,48 @@
{
"name": "zapdb-kv",
"version": "1.0.0",
"main": "src/index.js",
"repository": "git@git.cryto.net:joepie91/zapdb-kv.git",
"author": "Sven Slootweg <admin@cryto.net>",
"license": "WTFPL OR CC0-1.0",
"scripts": {
"server": "concurrently --kill-others -p '[{name}]' -n 'node,PD' -c 'bgBlue.bold,bgRed.bold' './start-server.sh' './start-pd.sh'",
"test": "tape tests/**/*.js | tap-difflet -p"
},
"devDependencies": {
"@joepie91/eslint-config": "^1.1.0",
"cbor": "^8.1.0",
"concurrently": "^6.2.1",
"enzyme": "^3.11.0",
"eslint": "^8.9.0",
"filled-array": "^2.2.0",
"snapshotter": "^3.0.1",
"tap": "^15.1.5",
"tap-difflet": "^0.7.2",
"tape": "^5.3.2"
},
"dependencies": {
"@extra-bigint/log2": "^0.0.53",
"@joepie91/unreachable": "^1.0.0",
"@js-temporal/polyfill": "^0.3.0",
"as-expression": "^1.0.0",
"assure-array": "^1.0.0",
"big-varint": "^0.1.0",
"bigint-buffer": "^1.1.5",
"cartesian-product": "^2.1.2",
"default-value": "^1.0.0",
"fix-esm": "^1.0.1",
"lmdb": "^1.6.6",
"match-value": "^1.1.0",
"merge-by-template": "^0.1.3",
"seed-random": "^2.2.0",
"split-filter-n": "^1.1.3",
"syncpipe": "^1.0.0",
"time-call": "^0.1.0",
"unicode-collation-algorithm2": "^0.3.1",
"varint": "^6.0.0"
},
"snapshotter": {
"snapshotPath": "./tests/_snapshots"
}
}

@ -0,0 +1,80 @@
"use strict";
let dummyMigrations = [
{ id: 1, operations: [
{ type: "createCollection", name: "users", operations: [
{ type: "createField", name: "username", fieldType: "string", required: true, attributes: {} },
{ type: "createField", name: "passwordHash", fieldType: "string", required: true, attributes: {} },
{ type: "createField", name: "emailAddress", fieldType: "string", required: false, attributes: {} },
{ type: "createField", name: "isActive", fieldType: "boolean", required: true, attributes: {} },
{ type: "createField", name: "registrationDate", fieldType: "date", required: true, attributes: { withTimezone: false }},
{ type: "createField", name: "invitesLeft", fieldType: "integer", required: true, attributes: {} },
]}
]},
{ id: 2, operations: [
{ type: "modifyCollection", name: "users", operations: [
{ type: "setFieldAttributes", name: "emailAddress", required: false, attributes: {} },
{ type: "setFieldAttributes", name: "isActive", required: true, attributes: {} },
{ type: "setFieldAttributes", name: "registrationDate", attributes: { withTimezone: true }},
{ type: "setFieldAttributes", name: "invitesLeft", attributes: { signed: false }},
]}
]},
];
let dummyItems = [{
username: "joepie91",
passwordHash: "foo",
emailAddress: "admin@cryto.net",
isActive: true,
registrationDate: new Date(),
invitesLeft: 100
}, {
username: "test",
passwordHash: "bar",
emailAddress: "test@cryto.net",
isActive: false,
registrationDate: new Date(),
invitesLeft: 0
}];
const reduceSchema = require("../../src/schema/reducer");
const createRecordCoder = require("../../src/storage-encoder/record-coder");
let schema = reduceSchema(dummyMigrations);
let orderedTableSchema = Object.entries(schema.tables.users.fields)
.map(([ name, settings ]) => {
let { fieldType, ... rest } = settings;
return { name, type: fieldType, ... rest };
})
.sort((a, b) => {
if (b.name < a.name) {
return 1;
} else if (b.name > a.name) {
return -1;
} else {
return 0;
}
});
let tableEncoder = createRecordCoder(orderedTableSchema);
let encodedItems = dummyItems.map((item) => tableEncoder.encode(item));
let decodedItems = encodedItems.map((item) => tableEncoder.decode(item.record));
console.log(tableEncoder);
console.log("# Schema:");
console.dir(schema, { depth: null });
console.log("# Input items:");
console.dir(dummyItems, { depth: null });
console.log("# Encoded items:");
console.dir(encodedItems, { depth: null });
console.log("# Decoded items:");
console.dir(decodedItems, { depth: null });
// MARKER: Auxiliary blob handling, somehow
// MARKER: Implement support for optional fields in record-coder; including encoding the presence mask into the encoded records

@ -0,0 +1,18 @@
"use strict";
const { required, string, integer, boolean, date, optional, defaultTo, addCollection, addFields, addIndex } = require("../../../src/schema/methods/v1");
module.exports = [
addCollection("users", [
addFields({
username: [ required, string ],
passwordHash: [ required, string ],
emailAddress: [ optional, string ],
invitesLeft: [ required, integer, defaultTo(0) ],
isActive: [ required, boolean, defaultTo(false) ],
registrationDate: [ required, date, defaultTo(() => new Date()) ]
}),
addIndex("username"),
addIndex("registrationDate"),
])
];

@ -0,0 +1,21 @@
"use strict";
const { changeCollection, addFields, changeFields, addIndex, decimal, required, defaultExistingTo, precision, defaultTo, withTimezone, unsigned } = require("../../../src/schema/methods/v1");
module.exports = [
changeCollection("users", [
addFields({
karmaScore: [ required, decimal, precision(4), defaultTo(0) ]
}),
changeFields({
// A migration default is mandatory when making a field `required` in a `changeFields`
emailAddress: [ required, defaultExistingTo("INVALID@example.com") ],
// TODO: To detect bugs early, disallow no-op changes in schema modifications?
isActive: [ required ],
// TODO: Changing to *without* a timezone should require an explicit allowDestructive modifier, as it would require normalizing all dates to UTC, losing the original timezone information in the process. Or maybe an `unsafe` wrapper? Like `unsafe(withoutTimezone)` or `destructive(withoutTimezone)`, but only for *modification* cases
registrationDate: [ withTimezone ],
invitesLeft: [ unsigned ]
}),
addIndex("karmaScore")
])
];

@ -0,0 +1,80 @@
"use strict";
let dummyMigrations = [
{ id: 1, operations: [
{ type: "createCollection", name: "users", operations: [
{ type: "createField", name: "username", fieldType: "string", required: true, attributes: {} },
{ type: "createField", name: "passwordHash", fieldType: "string", required: true, attributes: {} },
{ type: "createField", name: "emailAddress", fieldType: "string", required: false, attributes: {} },
{ type: "createField", name: "isActive", fieldType: "boolean", required: true, attributes: {} },
{ type: "createField", name: "registrationDate", fieldType: "date", required: true, attributes: { withTimezone: false }},
{ type: "createField", name: "invitesLeft", fieldType: "integer", required: true, attributes: {} },
]}
]},
{ id: 2, operations: [
{ type: "modifyCollection", name: "users", operations: [
{ type: "setFieldAttributes", name: "emailAddress", required: false, attributes: {} },
{ type: "setFieldAttributes", name: "isActive", required: true, attributes: {} },
{ type: "setFieldAttributes", name: "registrationDate", attributes: { withTimezone: true }},
{ type: "setFieldAttributes", name: "invitesLeft", attributes: { signed: false }},
]}
]},
];
let dummyItems = [{
username: "joepie91",
passwordHash: "foo",
emailAddress: "admin@cryto.net",
isActive: true,
registrationDate: new Date(),
invitesLeft: 100
}, {
username: "test",
passwordHash: "bar",
emailAddress: "test@cryto.net",
isActive: false,
registrationDate: new Date(),
invitesLeft: 0
}];
const reduceSchema = require("../src/schema/reducer");
const createRecordCoder = require("../src/storage-encoder/record-coder");
let schema = reduceSchema(dummyMigrations);
let orderedTableSchema = Object.entries(schema.tables.users.fields)
.map(([ name, settings ]) => {
let { fieldType, ... rest } = settings;
return { name, type: fieldType, ... rest };
})
.sort((a, b) => {
if (b.name < a.name) {
return 1;
} else if (b.name > a.name) {
return -1;
} else {
return 0;
}
});
let tableEncoder = createRecordCoder(orderedTableSchema);
let encodedItems = dummyItems.map((item) => tableEncoder.encode(item));
let decodedItems = encodedItems.map((item) => tableEncoder.decode(item.record));
console.log(tableEncoder);
console.log("# Schema:");
console.dir(schema, { depth: null });
console.log("# Input items:");
console.dir(dummyItems, { depth: null });
console.log("# Encoded items:");
console.dir(encodedItems, { depth: null });
console.log("# Decoded items:");
console.dir(decodedItems, { depth: null });
// MARKER: Auxiliary blob handling, somehow
// MARKER: Implement support for optional fields in record-coder; including encoding the presence mask into the encoded records

@ -0,0 +1,49 @@
{ pkgs ? import <nixpkgs> {} }:
with pkgs;
let
version = "5.1.1";
os = "linux";
architecture = "amd64";
binaryPackage = meta: stdenv.mkDerivation ({
phases = "unpackPhase installPhase fixupPhase";
installPhase = ''
mkdir -p $out/bin
cp -r * $out/bin/
'';
sourceRoot = ".";
nativeBuildInputs = [ autoPatchelfHook ];
} // meta);
serverPackage = binaryPackage {
name = "tikv-server-${version}";
src = fetchurl {
url = "https://tiup-mirrors.pingcap.com/tikv-v${version}-${os}-${architecture}.tar.gz";
sha256 = "0sl6bhy7irvk48pss2bmmnl4yflxkpi8kfl8hg09bk7a8dqjqfcy";
};
};
pdPackage = binaryPackage {
name = "tikv-pd-${version}";
src = fetchurl {
url = "https://tiup-mirrors.pingcap.com/pd-v${version}-${os}-${architecture}.tar.gz";
sha256 = "1mzkbnid4kzxysnnkngvdqxfxvdcm718j248181zax1rl0x313ps";
};
};
ctlPackage = binaryPackage {
name = "tikv-ctl-${version}";
src = fetchurl {
url = "https://tiup-mirrors.pingcap.com/ctl-v${version}-${os}-${architecture}.tar.gz";
sha256 = "0g8wkqqyi8zvh3zfslyzf0c1nijw7maqlp99lrfw6vql4k3wn6b1";
};
};
in stdenv.mkDerivation rec {
name = "zapdb-kv-env";
buildInputs = [
serverPackage
pdPackage
ctlPackage
];
}

@ -0,0 +1,13 @@
"use strict";
const lmdb = require("lmdb");
function prefixSearch(...) {
...
}
module.exports = function createLMDBBackend() {
return {
};
};

@ -0,0 +1,475 @@
"use strict";
// DO NOT reorder this list! It is used to determine the internal ID for each timezone name, and changing the order will break parsing!
module.exports = [
"Africa/Abidjan",
"Africa/Accra",
"Africa/Addis_Ababa",
"Africa/Algiers",
"Africa/Asmara",
"Africa/Bamako",
"Africa/Bangui",
"Africa/Banjul",
"Africa/Bissau",
"Africa/Blantyre",
"Africa/Brazzaville",
"Africa/Bujumbura",
"Africa/Cairo",
"Africa/Casablanca",
"Africa/Ceuta",
"Africa/Conakry",
"Africa/Dakar",
"Africa/Dar_es_Salaam",
"Africa/Djibouti",
"Africa/Douala",
"Africa/El_Aaiun",
"Africa/Freetown",
"Africa/Gaborone",
"Africa/Harare",
"Africa/Johannesburg",
"Africa/Juba",
"Africa/Kampala",
"Africa/Khartoum",
"Africa/Kigali",
"Africa/Kinshasa",
"Africa/Lagos",
"Africa/Libreville",
"Africa/Lome",
"Africa/Luanda",
"Africa/Lubumbashi",
"Africa/Lusaka",
"Africa/Malabo",
"Africa/Maputo",
"Africa/Maseru",
"Africa/Mbabane",
"Africa/Mogadishu",
"Africa/Monrovia",
"Africa/Nairobi",
"Africa/Ndjamena",
"Africa/Niamey",
"Africa/Nouakchott",
"Africa/Ouagadougou",
"Africa/Porto-Novo",
"Africa/Sao_Tome",
"Africa/Timbuktu",
"Africa/Tripoli",
"Africa/Tunis",
"Africa/Windhoek",
"America/Adak",
"America/Anchorage",
"America/Anguilla",
"America/Antigua",
"America/Araguaina",
"America/Argentina/Buenos_Aires",
"America/Argentina/Catamarca",
"America/Argentina/ComodRivadavia",
"America/Argentina/Cordoba",
"America/Argentina/Jujuy",
"America/Argentina/La_Rioja",
"America/Argentina/Mendoza",
"America/Argentina/Rio_Gallegos",
"America/Argentina/Salta",
"America/Argentina/San_Juan",
"America/Argentina/San_Luis",
"America/Argentina/Tucuman",
"America/Argentina/Ushuaia",
"America/Aruba",
"America/Asuncion",
"America/Atikokan",
"America/Bahia",
"America/Bahia_Banderas",
"America/Barbados",
"America/Belem",
"America/Belize",
"America/Blanc-Sablon",
"America/Boa_Vista",
"America/Bogota",
"America/Boise",
"America/Cambridge_Bay",
"America/Campo_Grande",
"America/Cancun",
"America/Caracas",
"America/Cayenne",
"America/Cayman",
"America/Chicago",
"America/Chihuahua",
"America/Coral_Harbour",
"America/Costa_Rica",
"America/Creston",
"America/Cuiaba",
"America/Curacao",
"America/Danmarkshavn",
"America/Dawson",
"America/Dawson_Creek",
"America/Denver",
"America/Detroit",
"America/Dominica",
"America/Edmonton",
"America/Eirunepe",
"America/El_Salvador",
"America/Ensenada",
"America/Fortaleza",
"America/Fort_Nelson",
"America/Glace_Bay",
"America/Goose_Bay",
"America/Grand_Turk",
"America/Grenada",
"America/Guadeloupe",
"America/Guatemala",
"America/Guayaquil",
"America/Guyana",
"America/Halifax",
"America/Havana",
"America/Hermosillo",
"America/Indiana/Indianapolis",
"America/Indiana/Knox",
"America/Indiana/Marengo",
"America/Indiana/Petersburg",
"America/Indiana/Tell_City",
"America/Indiana/Vevay",
"America/Indiana/Vincennes",
"America/Indiana/Winamac",
"America/Inuvik",
"America/Iqaluit",
"America/Jamaica",
"America/Juneau",
"America/Kentucky/Louisville",
"America/Kentucky/Monticello",
"America/La_Paz",
"America/Lima",
"America/Los_Angeles",
"America/Maceio",
"America/Managua",
"America/Manaus",
"America/Martinique",
"America/Matamoros",
"America/Mazatlan",
"America/Menominee",
"America/Merida",
"America/Metlakatla",
"America/Mexico_City",
"America/Miquelon",
"America/Moncton",
"America/Monterrey",
"America/Montevideo",
"America/Montreal",
"America/Montserrat",
"America/Nassau",
"America/New_York",
"America/Nipigon",
"America/Nome",
"America/Noronha",
"America/North_Dakota/Beulah",
"America/North_Dakota/Center",
"America/North_Dakota/New_Salem",
"America/Nuuk",
"America/Ojinaga",
"America/Panama",
"America/Pangnirtung",
"America/Paramaribo",
"America/Phoenix",
"America/Port-au-Prince",
"America/Port_of_Spain",
"America/Porto_Velho",
"America/Puerto_Rico",
"America/Punta_Arenas",
"America/Rainy_River",
"America/Rankin_Inlet",
"America/Recife",
"America/Regina",
"America/Resolute",
"America/Rio_Branco",
"America/Rosario",
"America/Santarem",
"America/Santiago",
"America/Santo_Domingo",
"America/Sao_Paulo",
"America/Scoresbysund",
"America/Sitka",
"America/St_Johns",
"America/St_Kitts",
"America/St_Lucia",
"America/St_Thomas",
"America/St_Vincent",
"America/Swift_Current",
"America/Tegucigalpa",
"America/Thule",
"America/Thunder_Bay",
"America/Tijuana",
"America/Toronto",
"America/Tortola",
"America/Vancouver",
"America/Whitehorse",
"America/Winnipeg",
"America/Yakutat",
"America/Yellowknife",
"Antarctica/Casey",
"Antarctica/Davis",
"Antarctica/DumontDUrville",
"Antarctica/Macquarie",
"Antarctica/Mawson",
"Antarctica/McMurdo",
"Antarctica/Palmer",
"Antarctica/Rothera",
"Antarctica/Syowa",
"Antarctica/Troll",
"Antarctica/Vostok",
"Asia/Aden",
"Asia/Almaty",
"Asia/Amman",
"Asia/Anadyr",
"Asia/Aqtau",
"Asia/Aqtobe",
"Asia/Ashgabat",
"Asia/Atyrau",
"Asia/Baghdad",
"Asia/Bahrain",
"Asia/Baku",
"Asia/Bangkok",
"Asia/Barnaul",
"Asia/Beirut",
"Asia/Bishkek",
"Asia/Brunei",
"Asia/Chita",
"Asia/Choibalsan",
"Asia/Chongqing",
"Asia/Colombo",
"Asia/Damascus",
"Asia/Dhaka",
"Asia/Dili",
"Asia/Dubai",
"Asia/Dushanbe",
"Asia/Famagusta",
"Asia/Gaza",
"Asia/Hanoi",
"Asia/Harbin",
"Asia/Hebron",
"Asia/Ho_Chi_Minh",
"Asia/Hong_Kong",
"Asia/Hovd",
"Asia/Irkutsk",
"Asia/Jakarta",
"Asia/Jayapura",
"Asia/Jerusalem",
"Asia/Kabul",
"Asia/Kamchatka",
"Asia/Karachi",
"Asia/Kashgar",
"Asia/Kathmandu",
"Asia/Khandyga",
"Asia/Kolkata",
"Asia/Krasnoyarsk",
"Asia/Kuala_Lumpur",
"Asia/Kuching",
"Asia/Kuwait",
"Asia/Macau",
"Asia/Magadan",
"Asia/Makassar",
"Asia/Manila",
"Asia/Muscat",
"Asia/Nicosia",
"Asia/Novokuznetsk",
"Asia/Novosibirsk",
"Asia/Omsk",
"Asia/Oral",
"Asia/Phnom_Penh",
"Asia/Pontianak",
"Asia/Pyongyang",
"Asia/Qatar",
"Asia/Qostanay",
"Asia/Qyzylorda",
"Asia/Riyadh",
"Asia/Sakhalin",
"Asia/Samarkand",
"Asia/Seoul",
"Asia/Shanghai",
"Asia/Singapore",
"Asia/Srednekolymsk",
"Asia/Taipei",
"Asia/Tashkent",
"Asia/Tbilisi",
"Asia/Tehran",
"Asia/Tel_Aviv",
"Asia/Thimphu",
"Asia/Tokyo",
"Asia/Tomsk",
"Asia/Ulaanbaatar",
"Asia/Urumqi",
"Asia/Ust-Nera",
"Asia/Vientiane",
"Asia/Vladivostok",
"Asia/Yakutsk",
"Asia/Yangon",
"Asia/Yekaterinburg",
"Asia/Yerevan",
"Atlantic/Azores",
"Atlantic/Bermuda",
"Atlantic/Canary",
"Atlantic/Cape_Verde",
"Atlantic/Faroe",
"Atlantic/Jan_Mayen",
"Atlantic/Madeira",
"Atlantic/Reykjavik",
"Atlantic/South_Georgia",
"Atlantic/Stanley",
"Atlantic/St_Helena",
"Australia/Adelaide",
"Australia/Brisbane",
"Australia/Broken_Hill",
"Australia/Currie",
"Australia/Darwin",
"Australia/Eucla",
"Australia/Hobart",
"Australia/Lindeman",
"Australia/Lord_Howe",
"Australia/Melbourne",
"Australia/Perth",
"Australia/Sydney",
"CET",
"CST6CDT",
"EET",
"EST",
"EST5EDT",
"Etc/GMT",
"Etc/GMT+1",
"Etc/GMT-1",
"Etc/GMT+10",
"Etc/GMT-10",
"Etc/GMT+11",
"Etc/GMT-11",
"Etc/GMT+12",
"Etc/GMT-12",
"Etc/GMT-13",
"Etc/GMT-14",
"Etc/GMT+2",
"Etc/GMT-2",
"Etc/GMT+3",
"Etc/GMT-3",
"Etc/GMT+4",
"Etc/GMT-4",
"Etc/GMT+5",
"Etc/GMT-5",
"Etc/GMT+6",
"Etc/GMT-6",
"Etc/GMT+7",
"Etc/GMT-7",
"Etc/GMT+8",
"Etc/GMT-8",
"Etc/GMT+9",
"Etc/GMT-9",
"Etc/UTC",
"Europe/Amsterdam",
"Europe/Andorra",
"Europe/Astrakhan",
"Europe/Athens",
"Europe/Belfast",
"Europe/Belgrade",
"Europe/Berlin",
"Europe/Brussels",
"Europe/Bucharest",
"Europe/Budapest",
"Europe/Chisinau",
"Europe/Copenhagen",
"Europe/Dublin",
"Europe/Gibraltar",
"Europe/Guernsey",
"Europe/Helsinki",
"Europe/Isle_of_Man",
"Europe/Istanbul",
"Europe/Jersey",
"Europe/Kaliningrad",
"Europe/Kiev",
"Europe/Kirov",
"Europe/Lisbon",
"Europe/Ljubljana",
"Europe/London",
"Europe/Luxembourg",
"Europe/Madrid",
"Europe/Malta",
"Europe/Minsk",
"Europe/Monaco",
"Europe/Moscow",
"Europe/Oslo",
"Europe/Paris",
"Europe/Prague",
"Europe/Riga",
"Europe/Rome",
"Europe/Samara",
"Europe/Sarajevo",
"Europe/Saratov",
"Europe/Simferopol",
"Europe/Skopje",
"Europe/Sofia",
"Europe/Stockholm",
"Europe/Tallinn",
"Europe/Tirane",
"Europe/Tiraspol",
"Europe/Ulyanovsk",
"Europe/Uzhgorod",
"Europe/Vaduz",
"Europe/Vienna",
"Europe/Vilnius",
"Europe/Volgograd",
"Europe/Warsaw",
"Europe/Zagreb",
"Europe/Zaporozhye",
"Europe/Zurich",
"Factory",
"HST",
"Indian/Antananarivo",
"Indian/Chagos",
"Indian/Christmas",
"Indian/Cocos",
"Indian/Comoro",
"Indian/Kerguelen",
"Indian/Mahe",
"Indian/Maldives",
"Indian/Mauritius",
"Indian/Mayotte",
"Indian/Reunion",
"MET",
"MST",
"MST7MDT",
"Pacific/Apia",
"Pacific/Auckland",
"Pacific/Bougainville",
"Pacific/Chatham",
"Pacific/Chuuk",
"Pacific/Easter",
"Pacific/Efate",
"Pacific/Enderbury",
"Pacific/Fakaofo",
"Pacific/Fiji",
"Pacific/Funafuti",
"Pacific/Galapagos",
"Pacific/Gambier",
"Pacific/Guadalcanal",
"Pacific/Guam",
"Pacific/Honolulu",
"Pacific/Johnston",
"Pacific/Kiritimati",
"Pacific/Kosrae",
"Pacific/Kwajalein",
"Pacific/Majuro",
"Pacific/Marquesas",
"Pacific/Midway",
"Pacific/Nauru",
"Pacific/Niue",
"Pacific/Norfolk",
"Pacific/Noumea",
"Pacific/Pago_Pago",
"Pacific/Palau",
"Pacific/Pitcairn",
"Pacific/Pohnpei",
"Pacific/Port_Moresby",
"Pacific/Rarotonga",
"Pacific/Saipan",
"Pacific/Tahiti",
"Pacific/Tarawa",
"Pacific/Tongatapu",
"Pacific/Wake",
"Pacific/Wallis",
"PST8PDT",
"WET",
];

@ -0,0 +1,8 @@
"use strict";
module.exports = function printBits(value) {
// FIXME: Verify that this also works for unsigned values!
let bits = BigInt(value).toString(2);
let padSize = Math.ceil(bits.length / 8) * 8;
return bits.padStart(padSize, "0");
}

@ -0,0 +1,7 @@
"use strict";
const crypto = require("crypto");
module.exports = function generateID() {
return crypto.randomBytes(12);
};

@ -0,0 +1,56 @@
"use strict";
const unreachable = require("@joepie91/unreachable")("zapdb");
const matchValue = require("match-value");
// const createLMDBBackend = require("./backend/lmdb");
const queryBuilder = require("./query-builder");
// TODO: Type decoding hook for eg. turning decimal strings into bigints or numbers
Object.assign(module.exports, queryBuilder);
function findNodeType(nodes, type) {
let nodeIndex = nodes.findIndex((node) => node.type === type);
if (nodeIndex != null) {
return nodes[nodeIndex];
} else {
throw new Error(`Failed to locate expected '${type}' node`);
}
}
let temporaryHardcodedSchema = [
{ name: "_id", type: "bytes", required: true },
{ name: "username", type: "string", required: true },
{ name: "email", type: "string", required: true },
{ name: "activated", type: "boolean", required: true },
{ name: "notes", type: "string", required: false }
];
module.exports.createClient = function(options) {
let { schema, backend } = options;
function insertQuery(query, parameters) {
let items = findNodeType(query.clauses, "items").items;
console.log(items);
}
function selectQuery(query, parameters) {}
function updateQuery(query, parameters) {}
function deleteQuery(query, parameters) {}
return {
query: function (query, parameters) {
matchValue(query.type, {
insert: () => insertQuery(query, parameters),
update: () => updateQuery(query, parameters),
delete: () => deleteQuery(query, parameters),
select: () => selectQuery(query, parameters),
});
},
transaction: function (callback) {
}
};
};

@ -0,0 +1,112 @@
"use strict";
const timeCall = require("time-call");
const syncpipe = require("syncpipe");
// let list1 = [ 2, 9, 10, 12, 13, 16, 19, 21, 23, 24, 33, 43, 46, 48, 49, 58, 60, 61, 69, 71, 74, 75, 78, 79, 80, 82, 85, 86, 88, 90, 91, 92, 95, 98, 99 ];
// let list2 = [ 3, 4, 5, 9, 10, 12, 13, 14, 16, 17, 19, 23, 24, 28, 29, 31, 32, 34, 37, 41, 42, 44, 45, 48, 50, 51, 52, 55, 56, 64, 69, 75, 77, 79, 85, 87, 91, 92, 93, 94, 98 ];
// let list3 = [ 2, 5, 8, 9, 15, 23, 27, 31, 32, 33, 34, 36, 37, 40, 43, 45, 53, 54, 56, 58, 60, 63, 64, 66, 71, 72, 74, 75, 78, 84, 89, 91, 94, 96, 97, 98, 99 ];
function randomIntegers(count, limit) {
return syncpipe(new Array(count), [
_ => _.fill(0),
_ => _.map(() => Math.ceil(Math.random() * limit)),
_ => new Set(_),
_ => Array.from(_),
_ => _.sort((a, b) => a - b)
]);
}
let list1 = randomIntegers(1000, 2000);
let list2 = randomIntegers(1000, 2000);
let list3 = randomIntegers(1000, 2000);
console.log(list1);
function intersectThree(list1, list2, list3) {
let pointer1 = 0;
let pointer2 = 0;
let pointer3 = 0;
let results = [];
while (pointer1 < list1.length && pointer2 < list2.length && pointer3 < list3.length) {
let value1 = list1[pointer1];
let value2 = list2[pointer2];
let value3 = list3[pointer3];
if (value1 === value2 && value1 === value3) {
results.push(value1);
pointer1++;
pointer2++;
pointer3++;
} else {
let lowest = Math.min(value1, value2, value3);
if (value1 === lowest) { pointer1++; }
if (value2 === lowest) { pointer2++; }
if (value3 === lowest) { pointer3++; }
}
}
return results;
}
function intersectSets(list1, list2, list3) {
let set2 = new Set(list2);
let set3 = new Set(list3);
return list1.filter((value) => set2.has(value) && set3.has(value));
}
function intersectSets2(list1, list2, list3) {
let set2 = new Set(list2);
let set3 = new Set(list3);
let results = [];
for (let value of list1) {
if (set2.has(value) && set3.has(value)) {
results.push(value);
}
}
return results;
}
function tryOut(ITERATIONS) {
console.log(`# ${ITERATIONS} iterations, time is per iteration`);
let result1 = timeCall(() => {
for (let i = 0; i < ITERATIONS; i++) {
intersectThree(list1, list2, list3);
}
});
let result2 = timeCall(() => {
for (let i = 0; i < ITERATIONS; i++) {
intersectSets(list1, list2, list3);
}
});
let result3 = timeCall(() => {
for (let i = 0; i < ITERATIONS; i++) {
intersectSets2(list1, list2, list3);
}
});
console.log({
pointer: result1.time / ITERATIONS / 1e3 + "us",
setsFilter: result2.time / ITERATIONS / 1e3 + "us",
setsFor: result3.time / ITERATIONS / 1e3 + "us"
});
}
tryOut(100);
tryOut(1000);
tryOut(10000);
tryOut(100000);
// console.log(intersectThree(list1, list2, list3));

@ -0,0 +1,19 @@
"use strict";
const syncpipe = require("syncpipe");
const timezoneNames = require("./data/timezone-names");
let inverseMapping = syncpipe(timezoneNames, [
_ => _.map((name, i) => [ name, i ]),
_ => new Map(_)
]);
module.exports = function lookupTimezoneName(name) {
if (inverseMapping.has(name)) {
return inverseMapping.get(name);
} else {
// FIXME: Error type, clearer instructions for end users since this may also happen when our timezone list is outdated
throw new Error(`Unknown timezone name`);
}
};

@ -0,0 +1,67 @@
"use strict";
const createArithmeticCoder = require("../arithmetic-coder");
function encodeBooleans(booleans) {
let n = 1n;
let bitmask = 0n;
for (let boolean of booleans) {
if (boolean === true) {
bitmask |= n;
}
n *= 2n;
}
return bitmask;
}
function decodeBooleans(bitmask, count) {
let n = 1n;
let booleans = [];
for (let i = 0; i < count; i++) {
booleans.push((bitmask & n) !== 0n);
n *= 2n;
}
return booleans;
}
module.exports = function createBitmaskArithmeticCoder(fields) {
// NOTE: We *always* store the bitmask as the very first field, to ensure that it doesn't interfere with binary sorting order
let fieldCount = BigInt(fields.length);
let maximumBitmaskValue = 2n ** fieldCount; // NOTE: Exclusive
let coder = createArithmeticCoder([
{ name: "__bitmask", minimum: 0, maximum: maximumBitmaskValue },
... fields
]);
return {
bits: coder.bits,
encode: function (data) {
let fieldPresence = fields.map((field) => data[field.name] != null);
return coder.encode({
... data,
__bitmask: encodeBooleans(fieldPresence)
});
},
decode: function (data) {
let decoded = coder.decode(data);
let fieldPresence = decodeBooleans(decoded.__bitmask, fields.length);
fields.forEach((field, i) => {
if (fieldPresence[i] === false) {
decoded[field.name] = undefined;
}
});
delete decoded.__bitmask;
return decoded;
}
};
};

@ -0,0 +1,83 @@
"use strict";
const assert = require("assert");
const bigintLog2 = require("@extra-bigint/log2");
function bitsNeeded(value) {
if (value === 0) {
return 1n;
} else {
return bigintLog2(value) + 1n;
}
}
function remainderDivide(number, divideBy) {
let remainder = number % divideBy;
let wholes = (number - remainder) / divideBy;
return [ wholes, remainder ];
}
module.exports = function createArithmeticCoder(fields) {
// NOTE: The fields are order-sensitive! You can *only* add a field to the definition afterwards without breaking decoding of existing values, if you put that new field at the *end*. Ranges of existing fields should never be changed, as this will break decoding.
// NOTE: Minimum is inclusive, maximum is exclusive
// NOTE: For binary sortability, the fields should be ordered from least to most significant
// second, ..., day, ... year, mask, timezone
let nextMultiplier = 1n;
let processedFields = fields.map((field) => {
let minimum = BigInt(field.minimum);
let maximum = BigInt(field.maximum);
let range = maximum - minimum;
let processed = {
offset: minimum,
range: range,
minimum: minimum,
maximum: maximum,
multiplier: nextMultiplier,
name: field.name
};
nextMultiplier = nextMultiplier * range;
return processed;
});
let maximumValue = nextMultiplier;
let reverseFields = processedFields.slice().reverse();
return {
bits: bitsNeeded(maximumValue - 1n),
encode: function (data) {
let number = processedFields.reduce((total, field) => {
let value = data[field.name];
if (value != null) {
let valueN = BigInt(value);
assert(valueN >= field.minimum && valueN < field.maximum);
let normalized = valueN - field.offset;
return total + (normalized * field.multiplier);
} else {
// Effectively store a 0, and assume that the calling code deals with any requiredness constraints and understands how to handle this case
return total;
}
}, 0n);
return number;
},
decode: function (number) {
let result = {};
for (let field of reverseFields) {
let [ wholes, remainder ] = remainderDivide(number, field.multiplier);
number = remainder;
result[field.name] = wholes + field.offset;
}
return result;
}
};
};

@ -0,0 +1,50 @@
"use strict";
module.exports = function immutableDeepMerge(object1, object2) {
let hasChanges = false;
let changedProperties = { };
let deletedProperties = [];
if (object2 != null) {
for (let key of Object.keys(object2)) {
let value = object2[key];
let originalValue = object1[key];
if (value === Delete) {
deletedProperties.push(key);
hasChanges = true;
} else {
let transformedValue;
let normalizedValue = (typeof value === "function")
? value(originalValue)
: value;
if (typeof normalizedValue === "object" && normalizedValue !== null) {
// NOTE: We default to an empty object for the original value because from the perspective of a deep-merge, any nested paths required by the new input that don't exist in the original input should be imagined into existence.
transformedValue = immutableDeepMerge(originalValue ?? {}, normalizedValue);
} else {
transformedValue = normalizedValue;
}
changedProperties[key] = transformedValue;
if (transformedValue !== originalValue) {
hasChanges = true;
}
}
}
}
if (hasChanges) {
let merged = { ... object1, ... changedProperties };
for (let property of deletedProperties) {
delete merged[property];
}
return merged;
} else {
return object1;
}
};

@ -0,0 +1,15 @@
"use strict";
const cartesianProduct = require("cartesian-product");
module.exports = function namedCartesianProduct(object) {
let keys = Object.keys(object);
let products = cartesianProduct(keys.map((key) => object[key]));
return products.map((values) => {
return Object.fromEntries(keys.map((key, i) => {
return [ key, values[i] ];
}));
});
};

@ -0,0 +1,150 @@
"use strict";
const log2 = require("@extra-bigint/log2"); // TODO: Do we not need this anymore?
const assert = require("assert");
const countBits = require("../../storage-encoder/bitwise/count-bits");
const generateMask = require("../../storage-encoder/bitwise/generate-mask");
const invertBits = require("../../storage-encoder/bitwise/invert");
const invertBits1Byte = require("../../storage-encoder/bitwise/invert-1byte");
const truncateLeftBits = require("../../storage-encoder/bitwise/truncate-left-bits");
const bigintBuffer = require("../../storage-encoder/bigint/buffer");
const absBigInt = require("../../storage-encoder/bigint/abs");
function isNegativeHeaderByte(value) {
return (value & 128) === 0;
}
function calculateEncodedSize(value) {
let valueBitsNeeded = countBits(value);
let valueBytesNeeded = Math.ceil(valueBitsNeeded / 8);
let sizeBitsNeeded = valueBytesNeeded + 1;
// We loop here because the addition of a header can actually bump up the needed amount of bits in some cases. It should never be needed more than 3 times, though.
// FIXME: Add a limit and an error when it's exceeded
while (true) {
let totalBitsNeeded = valueBitsNeeded + sizeBitsNeeded;
let totalBytesNeeded = Math.ceil(totalBitsNeeded / 8);
if (sizeBitsNeeded === totalBytesNeeded + 1) {
return {
totalBytes: BigInt(totalBytesNeeded),
valueBits: BigInt(valueBitsNeeded)
};
} else {
sizeBitsNeeded = totalBytesNeeded + 1;
}
}
}
function readByteCount(value) {
if (value < 128) {
// 0xxxxxxx, this should never happen in the first byte!
return 0;
} else if (value & 128 && value < 192) {
// 10xxxxxx
return 1;
} else if (value & 192 && value < 224) {
// 110xxxxx
return 2;
} else if (value & 224 && value < 240) {
// 1110xxxx
return 3;
} else if (value & 240 && value < 248) {
// 11110xxx
return 4;
} else if (value & 248 && value < 252) {
// 111110xx
return 5;
} else if (value & 252 && value < 254) {
// 1111110x
return 6;
} else if (value === 254) {
// 11111110
return 7;
} else {
// 11111111
return 8;
}
}
function readBytes(bytes) {
assert(bytes.length > 0);
let negative = isNegativeHeaderByte(bytes[0]);
let headerRead = false;
let i = 0;
let totalByteCount = 0;
let value = 0n;
while (!headerRead || (i < totalByteCount)) {
let byte = bytes[i];
// If the first byte has a negative sign bit, invert the bits so that we can use the same byte count parsing logic for both negative and positive values
let normalizedByte = (negative)
? invertBits1Byte(byte)
: byte;
let byteValue;
if (!headerRead) {
let byteCount = readByteCount(normalizedByte);
totalByteCount += byteCount;
if (byteCount === 8) {
// Continue reading header bytes
continue;
} else {
if (totalByteCount === 0) {
throw new Error(`Found a 0-byte value, this should never happen`);
}
headerRead = true;
byteValue = truncateLeftBits(normalizedByte, byteCount + 1); // truncate the byteCount bits and the terminator bit
}
} else {
value <<= 8n;
byteValue = normalizedByte;
}
if (negative) {
value -= BigInt(byteValue);
} else {
value += BigInt(byteValue);
}
i++;
}
if (!headerRead) {
throw new Error(`Reached end of value while reading header`);
}
return {
value: value,
bytesRead: totalByteCount
};
}
module.exports = {
encode: function encodeOrderableVarint(value) {
let valueN = BigInt(value);
let absoluteValue = absBigInt(valueN);
// NOTE: totalBytes represents both the total size in bytes of the encoded value, *and* the amount of header bits (minus the terminator)
let { totalBytes, valueBits } = calculateEncodedSize(absoluteValue);
let headerBits = generateMask(totalBytes);
// Since the terminator bit is already accounted for in the size calculation, we don't need to do anything special for it here - it'll be void space between the header and the value by definition
let header = headerBits << (totalBytes * 7n);
let encodedValue = header + absoluteValue;
if (valueN < 0) {
encodedValue = invertBits(encodedValue);
}
return bigintBuffer.toBuffer(encodedValue);
},
decode: function decodeOrderableVarint(bytes) {
return readBytes(bytes);
}
};

@ -0,0 +1,52 @@
"use strict";
const fs = require("fs");
const path = require("path");
const cbor = require("cbor");
function sanitizeTestName(name) {
return name.replace(/[\/\\\s]+/g, "-");
}
function serializeValue(value) {
return cbor.encodeOne(value, { highWaterMark: 1e8 });
}
function deserializeValue(value) {
return cbor.decodeFirstSync(value);
}
module.exports = {
setup: function (tape, snapshotsRoot) {
Object.assign(tape.Test.prototype, {
equalsSnapshot: function (value, id) {
let testName = sanitizeTestName(this.name);
let snapshotNumber = (this.__lastSnapshot ?? 0) + 1;
this.__lastSnapshot = snapshotNumber;
let fullTestName = (id != null)
? `${testName}-${id}`
: `${testName}-snapshot-${snapshotNumber}`
let snapshotPath = path.resolve(snapshotsRoot, `${fullTestName}.cbor`);
let serializedValue = serializeValue(value);
if (process.env.UPDATE_SNAPSHOT === "ALL" || process.env.UPDATE_SNAPSHOT === fullTestName) {
fs.mkdirSync(snapshotsRoot, { recursive: true });
fs.writeFileSync(snapshotPath, serializedValue);
console.warn(`[!] Snapshot for '${fullTestName}' was updated`);
} else if (fs.existsSync(snapshotPath)) {
// NOTE: To ensure that the replacer transforms are applied to *both* values, we *always* serialize the current value even when we're just comparing it against a known one; we then just deserialize it again below.
// TODO: Investigate whether this can be optimized with a recursive object transform instead
let knownValue = deserializeValue(fs.readFileSync(snapshotPath));
let deserializedValue = deserializeValue(serializedValue);
this.deepEquals(deserializedValue, knownValue, `Snapshot for '${fullTestName}' does not match; re-run with UPDATE_SNAPSHOT=${fullTestName} to update the snapshot and mark the current result as valid`);
} else {
throw new Error(`No known snapshot for '${fullTestName}'; re-run with UPDATE_SNAPSHOT=${fullTestName} to create it automatically`);
}
}
});
}
};

@ -0,0 +1,161 @@
"use strict";
const assureArray = require("assure-array");
const assert = require("assert");
// moreThan, lessThan, equals, not, where, select, insert, update (set), delete, collapse, collapseBy, count, parameter
module.exports = {
insertInto: function (collection, clauses) {
return {
type: "insert",
collection: collection,
clauses: assureArray(clauses)
};
},
update: function (collection, clauses) {
return {
type: "insert",
collection: collection,
clauses: assureArray(clauses)
};
},
selectFrom: function (collection, clauses) {
return {
type: "select",
collection: collection,
clauses: assureArray(clauses)
};
},
deleteFrom: function (collection, clauses) {
return {
type: "delete",
collection: collection,
clauses: assureArray(clauses)
};
},
moreThan: function (value) {
return {
type: "moreThan",
value: value
};
},
lessThan: function (value) {
return {
type: "lessThan",
value: value
};
},
equals: function (value) {
return {
type: "equals",
value: value
};
},
not: function (value) {
return {
type: "not",
value: value
};
},
where: function (conditions) {
return {
type: "where",
conditions: conditions
};
},
collapse: function (reducers) {
return {
type: "collapse",
fields: null,
reducers: reducers
};
},
collapseBy: function (fields, reducers) {
return {
type: "collapse",
fields: assureArray(fields),
reducers: reducers
};
},
item: function (item) {
return {
type: "items",
items: [ item ]
};
},
items: function (items) {
return {
type: "items",
items: items
};
},
set: function (properties) {
return {
type: "set",
properties: properties
};
},
anyOf: function (options) {
return {
type: "anyOf",
options: options
};
},
allOf: function (options) {
return {
type: "allOf",
options: options
};
},
parameter: function (name) {
return {
type: "parameter",
name: name
};
},
average: average,
count: count,
sum: sum
};
function generateGetter(propertyPath) {
assert(propertyPath != null);
let segments = (Array.isArray(propertyPath))
? propertyPath
: propertyPath.split(".");
let propertyGetters = segments
.map((segment) => {
// FIXME: Escape!
// TODO: Can this be further optimized by using regular dot properties where safely possible?
return `["${segment}"]`;
})
.join("");
return new Function("item", `
return item${propertyGetters};
`);
}
function average(propertyPath) {
let getter = generateGetter(propertyPath);
return {
onValue: (total, item, _i) => total + getter(item),
onEnd: (total, count) => total / count
};
}
function count() {
return (total) => total + 1;
}
function sum(propertyPath) {
let getter = generateGetter(propertyPath);
return (total, item, _i) => total + getter(item);
}

@ -0,0 +1,21 @@
"use strict";
const rules = require("./rules");
module.exports = {
type: function (oldType, newType) {
return {
forward: rules.types[oldType].losslessConversionTo[newType],
backward: rules.types[newType].losslessConversionTo[oldType],
};
},
attribute: function (attribute, oldValue, newValue) {
let canTransformForward = rules.attributes[attribute].isLossless(oldValue, newValue);
let canTransformBackward = rules.attributes[attribute].isLossless(newValue, oldValue);
return {
forward: (canTransformForward) ? rules.attributes[attribute].losslessTransformer : undefined,
backward: (canTransformBackward) ? rules.attributes[attribute].losslessTransformer : undefined,
};
}
};

@ -0,0 +1,97 @@
"use strict";
function createSimpleNode(type) {
return { type: type };
}
function createSetAttributeNode(attribute, value) {
return {
type: "setAttribute",
attribute: attribute,
value: value
};
}
function createSetTypeNode(type) {
return {
type: "setFieldType",
fieldType: type
};
}
module.exports = {
// Field types
boolean: createSetTypeNode("boolean"),
integer: createSetTypeNode("integer"),
decimal: createSetTypeNode("decimal"),
string: createSetTypeNode("string"),
bytes: createSetTypeNode("bytes"),
date: createSetTypeNode("date"),
duration: createSetTypeNode("duration"),
// Optionality
required: createSetAttributeNode("required", true),
optional: createSetAttributeNode("required", false),
// Field attributes (numeric)
signed: createSetAttributeNode("signed", true),
unsigned: createSetAttributeNode("signed", false),
precision: function (digits) {
// For decimals only
return { type: "setAttribute", attribute: "precision", value: digits };
},
// Field attributes (dates)
withTimezone: createSetAttributeNode("withTimezone", true),
withoutTimezone: createSetAttributeNode("withTimezone", false),
// Defaults
defaultTo: function (valueOrFunction) {
return { type: "defaultTo", valueOrFunction: valueOrFunction };
},
defaultExistingTo: function (valueOrFunction) {
// This is only used for migrating existing records, eg. when a new field is added to the collection
return { type: "defaultExistingTo", valueOrFunction: valueOrFunction };
},
// Migration value transforms
transformTo: function (transformerFunction) {
// Mandatory when a field's schema changes in one or more ways that cannot be losslessly applied
return { type: "transformTo", transformer: transformerFunction };
},
rollbackTo: function (rollbackTransformerFunction) {
// Mandatory when a field's schema changes in one or more ways that make a lossless *rollback* impossible - alternatively, unsafeForbidRollback may also be used at the cost of making a rollback impossible (even in hot reloading mode!)
return { type: "rollbackTo", transformer: rollbackTransformerFunction };
},
unsafeForbidRollback: createSimpleNode("forbidRollback"),
// Schema operations (collections)
addCollection: function (name, operations) {
return { type: "addCollection", name: name, operations: operations };
},
modifyCollection: function (name, operations) {
return { type: "modifyCollection", name: name, operations: operations };
},
deleteCollection: function (name) {
return { type: "deleteCollection", name: name };
},
renameCollection: function (oldName, newName) {
// TODO: Figure out how to approach this. Maybe use a unique ID for every collection instead, and have a separate mapping from collection names to collection IDs?
return { type: "renameCollection", oldName: oldName, newName: newName };
},
// Schema operations (fields)
addFields: function (fields) {
return { type: "addFields", fields: fields };
},
modifyFields: function (fields) {
return { type: "modifyFields", fields: fields };
},
deleteField: function (name) {
return { type: "deleteField", name: name };
},
renameField: function (oldName, newName) {
return { type: "renameField", oldName: oldName, newName: newName };
},
// Schema operations (indexes)
addIndex: function (fieldName) {
// TODO: Determine an API for composite indexes
return { type: "addIndex", fieldName: fieldName };
},
removeIndex: function (fieldName) {
return { type: "removeIndex", fieldName: fieldName };
},
};

@ -0,0 +1,367 @@
/* eslint-disable no-loop-func */
"use strict";
const assert = require("assert");
const matchValue = require("match-value");
const splitFilterN = require("split-filter-n");
const unreachable = require("@joepie91/unreachable")("zapdb");
const immutableDeepMerge = require("../packages/immutable-deep-merge");
const rules = require("./rules");
const computeTransform = require("./compute-transform");
const compose = require("../util/compose");
// FIXME: table/row terminology etc.
// FIXME: replace asserts with proper checks and error messages
const Delete = Symbol("DeleteProperty");
// TODO: Find a way to roll this into merge-by-template somehow? The main difference is specifying dynamic transforms at rule definition time (and needing to use meta-objects in the mergeable) vs. specifying dynamic transforms at merge time directly
// TODO: Add API for "set this object literally, no merge"
// FIXME: Find a way to support arrays? Particularly objects *within* arrays, which would also need to be merged recursively...
function checkTransforms(operations) {
let byType = splitFilterN(operations, null, (operation) => operation.type);
if (byType.transformTo != null && byType.transformTo.length > 1) {
// FIXME: Error code
throw new Error(`Only one transformTo can be specified per modified field`);
}
if (byType.rollbackTo != null && byType.rollbackTo.length > 1) {
// FIXME: Error code
throw new Error(`Only one rollbackTo can be specified per modified field`);
}
if (byType.rollbackTo != null && byType.forbidRollback != null) {
// FIXME: Error code
throw new Error(`Cannot specify both a rollbackTo and an unsafeForbidRollback`);
}
let hasRollbackTransform = (byType.rollbackTo != null);
let hasRollbackProhibition = (byType.forbidRollback != null);
return {
hasTransform: (byType.transformTo != null),
hasRollback: (hasRollbackTransform || hasRollbackProhibition),
hasRollbackTransform: hasRollbackTransform,
hasRollbackProhibition: hasRollbackProhibition
};
}
// FIXME: Throw an error if a non-required transformTo is specified without a corresponding rollbackTo
function changeType(schema, newType) {
if (schema.type != newType) {
let newSchema = { type: newType };
for (let attribute of Object.keys(schema)) {
if (attribute === "type" || !rules.attributes[attribute].validForTypes.has(newType)) {
continue;
} else {
newSchema[attribute] = schema[attribute];
}
}
return newSchema;
} else {
throw new Error(`Tried to set field type to '${operation.fieldType}', but that is already the type`);
}
}
function applyFieldOperations(currentField = {}, operations) {
// Things that are specific to this migration
let state = {
schema: { ... currentField }, // Clone for local mutation
forwardTransform: null,
backwardTransform: null,
transformsRequired: false,
rollbackForbidden: false,
changedAttributes: []
};
for (let operation of operations) {
matchValue(operation.type, {
setFieldType: () => {
// NOTE: This is separated out into a function because a bunch of complexity is needed for determining which attributes can be kept
state.schema = changeType(state.schema, operation.fieldType);
state.transformsRequired = true;
},
setAttribute: () => {
if (state.schema[operation.attribute] !== operation.value) {
state.changedAttributes.push(operation.attribute);
state.schema[operation.attribute] = operation.value;
state.transformsRequired = true;
} else {
// FIXME: Error quality
throw new Error(`Tried to change '${operation.attribute}' attribute to '${operation.value}', but it's already set to that`);
}
},
transformTo: () => {
if (state.forwardTransform == null) {
state.forwardTransform = operation.transformer;
} else {
// FIXME: Error quality
throw new Error(`You can only specify one transformTo per field per migration`);
}
},
rollbackTo: () => {
if (state.backwardTransform == null) {
state.backwardTransform = operation.transformer;
} else {
// FIXME: Error quality
throw new Error(`You can only specify one rollbackTo per field per migration`);
}
},
forbidRollback: () => {
state.rollbackForbidden = true;
},
// TODO: rest of operations
});
}
function createTransformComputer() {
let automaticTransformers = [];
let requiredTransformers = [];
return {
changeType: function (oldType, newType) {
let automatic = rules.types[oldType].losslessConversionTo[newType];
if (automatic != null) {
automaticTransformers.push(automatic);
} else {
requiredTransformers.push({ type: "type", oldType, newType });
}
},
changeAttribute: function (attribute, oldValue, newValue) {
let canBeAutomatic = rules.attributes[attribute].isLossless(oldValue, newValue);
if (canBeAutomatic) {
automaticTransformers.push(rules.attributes[attribute].losslessTransformer);
} else {
requiredTransformers.push({ type: "attribute", attribute, oldValue, newValue });
}
},
getResults: function (manualTransformer, operationName) {
// NOTE: There are deliberately duplicate conditional clauses in here to improve readability!
if (requiredTransformers.length === 0 && automaticTransformers.length === 0) {
if (manualTransformer == null) {
// Identity function; no changes were made that affect the value itself
return (value) => value;
} else {
// FIXME: Better error message
throw new Error(`A ${operationName} operation was specified, but no other schema changes require one. Maybe you meant to use updateRecords instead?`);
}
} else if (requiredTransformers.length === 0 && automaticTransformers.length > 0) {
return compose(automaticTransformers);
} else if (requiredTransformers.length > 0) {
// FIXME: Better error message
throw new Error(`One or more schema changes can't be automatically applied, because a lossless automatic conversion of existing values is not possible; you need to specify a ${operationName} operation manually`);
} else {
throw unreachable("Impossible condition");
}
}
};
}
// NOTE: We disallow transformTo/rollbackTo when they are not required; if the user wishes to bulk-transform values, they should specify a changeRecords operation instead. Otherwise, we cannot implement "maybe you forgot a rollbackTo" errors, because that's only actually an error when a transform isn't *required*, and so if a user 'overreaches' in their type transform to also do a value transform we can't detect missing corresponding rollbackTo logic.
if (transformsRequired) {
let forwardTransformers = { automatic: [], required: [] };
let backwardTransformers = { automatic: [], required: [] };
function addTransformer(collection, automaticTransformer, marker) {
if (automaticTransformer != null) {
collection.automatic.push(automaticTransformer);
} else {
collection.required.push(marker);
}
}
let oldType = currentField.type;
let newType = state.schema.type;
let transformers = computeTransform.type(oldType, newType);
addTransformer(forwardTransformers, transformers.forward, { type: "type" });
addTransformer(backwardTransformers, transformers.backward, { type: "type" });
// FIXME: Currently this implementation assumes that *all* possible attributes are required, and it doesn't deal with cases where the attribute is currently unset. That needs to be changed, especially because new attributes can be changed in later versions of the schema builder, which older migrations won't be using.
// TODO/QUESTION: Maybe all attributes should just be given a default instead of being required? Otherwise over time there'll be a mix of required and optional attributes, the requiredness being determined solely by when the attribute was added to the query builder...
for (let attribute of state.changedAttributes) {
let oldValue = currentField[attribute];
let newValue = state.schema[attribute];
let transformers = computeTransform.attribute(attribute, oldValue, newValue);
addTransformer(forwardTransformers, transformers.forward, { type: "attribute", attribute: attribute });
addTransformer(backwardTransformers, transformers.backward, { type: "attribute", attribute: attribute });
}
if (forwardTransformers.required.length > 0 && state.forwardTransform == null) {
// FIXME: Error quality, list the specific reasons
throw new Error(`One or more schema changes require you to specify a transformTo operation`);
} else {
state.forwardTransform = compose(forwardTransformers.automatic);
}
if (backwardTransformers.required.length > 0 && state.backwardTransform == null) {
// FIXME: Error quality, list the specific reasons
throw new Error(`One or more schema changes require you to specify a rollbackTo operation`);
} else {
state.backwardTransform = compose(backwardTransformers.automatic);
}
} else {
if (state.forwardTransform != null || state.backwardTransform != null) {
// FIXME: Error quality and in-depth explanation
throw new Error(`You cannot specify a transformTo or rollbackTo operation unless a field type change requires it. Maybe you meant to use changeRecords instead?`);
// FIXME: modifyRecords instead of changeRecords? For consistency with other APIs
}
}
return state;
}
function tableOperationReducer(table, operation) {
return matchValue(operation.type, {
createField: () => immutableDeepMerge(table, {
fields: {
[operation.name]: (field) => {
assert(field === undefined);
let { type, name, ... props } = operation;
return props;
}
}
}),
setFieldAttributes: () => immutableDeepMerge(table, {
fields: {
[operation.name]: (field) => {
assert(field !== undefined);
let { type, name, ... props } = operation;
// TODO: Improve readability here
return {
... field,
... props,
attributes: {
... field.attributes,
... props.attributes
}
};
}
}
}),
addIndex: () => immutableDeepMerge(table, {
indexes: {
[operation.name]: operation.definition
}
})
});
}
function schemaOperationReducer(schema, operation) {
return matchValue(operation.type, {
createCollection: () => immutableDeepMerge(schema, {
tables: {
[operation.name]: (table) => {
assert(table === undefined);
return operation.operations.reduce(tableOperationReducer, {});
}
}
}),
modifyCollection: () => immutableDeepMerge(schema, {
tables: {
[operation.name]: (table) => {
assert(table !== undefined);
return operation.operations.reduce(tableOperationReducer, table);
}
}
}),
deleteCollection: () => {
throw new Error(`Not implemented yet`);
}
});
}
module.exports = function reduceMigrations(migrationList, initial = {}) {
return migrationList.reduce((lastSchema, migration) => {
return migration.operations.reduce(schemaOperationReducer, lastSchema);
}, initial);
};
// let dummyMigrations = [
// { id: 1, operations: [
// { type: "createCollection", name: "users", operations: [
// { type: "createField", name: "username", fieldType: "string", required: true },
// { type: "createField", name: "passwordHash", fieldType: "string", required: true },
// { type: "createField", name: "emailAddress", fieldType: "string", required: false },
// { type: "createField", name: "isActive", fieldType: "boolean", required: true },
// { type: "createField", name: "registrationDate", fieldType: "date", required: true, withTimezone: false },
// { type: "createField", name: "invitesLeft", fieldType: "integer", required: true },
// ]}
// ]},
// { id: 2, operations: [
// { type: "modifyCollection", name: "users", operations: [
// { type: "setFieldAttributes", name: "emailAddress", required: false },
// { type: "setFieldAttributes", name: "isActive", required: true },
// { type: "setFieldAttributes", name: "registrationDate", withTimezone: true },
// { type: "setFieldAttributes", name: "invitesLeft", signed: false },
// ]}
// ]},
// ];
let dummyMigrations = [
{ id: 1, operations: [
{ type: "createCollection", name: "users", operations: [
{ type: "createField", name: "username", operations: [
{ type: "changeType", fieldType: "string" },
{ type: "setAttribute", attribute: "required", value: true }
]},
{ type: "createField", name: "passwordHash", operations: [
{ type: "changeType", fieldType: "string" },
{ type: "setAttribute", attribute: "required", value: true }
]},
{ type: "createField", name: "emailAddress", operations: [
{ type: "changeType", fieldType: "string" },
{ type: "setAttribute", attribute: "required", value: false }
]},
{ type: "createField", name: "isActive", operations: [
{ type: "changeType", fieldType: "boolean" },
{ type: "setAttribute", attribute: "required", value: true }
]},
{ type: "createField", name: "registrationDate", operations: [
{ type: "changeType", fieldType: "date" },
{ type: "setAttribute", attribute: "required", value: true },
{ type: "setAttribute", attribute: "withTimezone", value: false },
]},
{ type: "createField", name: "invitesLeft", operations: [
{ type: "changeType", fieldType: "integer" },
{ type: "setAttribute", attribute: "required", value: true },
]},
]}
]},
{ id: 2, operations: [
{ type: "modifyCollection", name: "users", operations: [
{ type: "modifyField", name: "emailAddress", operations: [
{ type: "setAttribute", attribute: "required", value: false },
]},
// FIXME: Disallow no-ops for attribute changes?
{ type: "modifyField", name: "isActive", operations: [
{ type: "setAttribute", attribute: "required", value: true },
]},
{ type: "modifyField", name: "registrationDate", operations: [
{ type: "setAttribute", attribute: "withTimezone", value: true },
]},
{ type: "modifyField", name: "invitesLeft", operations: [
{ type: "setAttribute", attribute: "signed", value: false },
]},
{ type: "createField", name: "sendNewsletter", operations: [
{ type: "changeType", fieldType: "boolean" },
{ type: "setAttribute", attribute: "required", value: true }, // FIXME: Enforce a default in this case! Otherwise existing columns would be invalid
{ type: "setDefault", value: () => false }, // FIXME: Always specified as a value-producing function, or also allow literals?
]},
]}
]},
];
// console.dir(module.exports(dummyMigrations), { depth: null });

@ -0,0 +1,77 @@
"use strict";
const lookupTimezoneName = require("../lookup-timezone-name");
module.exports = {
// Note that this mapping can be used to determine the losslessness of both forward and backward migrations!
types: {
bytes: {
requiredAttributes: new Set([ "required" ]),
losslessConversionTo: {}
},
string: {
requiredAttributes: new Set([ "required" ]),
losslessConversionTo: {
bytes: (string) => Buffer.from(string, "utf8")
}
},
decimal: {
requiredAttributes: new Set([ "required", "signed", "precision" ]),
losslessConversionTo: {
// Decimals are already internally represented as strings
string: (decimal) => decimal
}
},
integer: {
requiredAttributes: new Set([ "required", "signed" ]),
losslessConversionTo: {
string: (integer) => String(integer),
decimal: (integer) => String(integer)
}
},
boolean: {
requiredAttributes: new Set([ "required" ]),
losslessConversionTo: {}
},
date: {
requiredAttributes: new Set([ "required", "withTimezone" ]),
losslessConversionTo: {}
},
duration: {
requiredAttributes: new Set([ "required" ]),
losslessConversionTo: {}
},
},
attributes: {
precision: {
validForTypes: new Set([ "decimal" ]),
isLossless: (oldSetting, newSetting) => (newSetting > oldSetting),
losslessTransformer: (value) => value, // No change to value
requiresMigrationDefault: false
},
signed: {
validForTypes: new Set([ "decimal", "integer" ]),
isLossless: (oldSetting, newSetting) => (oldSetting === false && newSetting === true),
losslessTransformer: (value, _oldAttribute, _newAttribute) => value, // No change
requiresMigrationDefault: false,
},
withTimezone: {
validForTypes: new Set([ "date" ]),
isLossless: (oldSetting, newSetting) => (oldSetting === false && newSetting === true),
losslessTransformer: (value) => ({
... value,
timezone: lookupTimezoneName("Etc/UTC")
}),
requiresMigrationDefault: false
},
required: {
// Valid for all types
validForTypes: true,
isLossless: true,
requiresMigrationDefault: true
}
},
operations: {
//
}
};

@ -0,0 +1,9 @@
"use strict";
module.exports = function absBigInt(number) {
if (number < 0n) {
return 0n - number;
} else {
return number;
}
};

@ -0,0 +1,22 @@
"use strict";
// TODO: Replace with native `bigint-buffer`, after figuring out a way to auto-determine the buffer size to pass to it
const assert = require("assert");
function padHex(hex) {
// NOTE: This is necessary because BigInt#toString will omit leading zeroes even if that leads to an uneven number of digits, and that will cause values to roundtrip incorrectly through Buffer (which seems to parse it as missing a *trailing* zero).
return hex.padStart(2 * Math.ceil(hex.length / 2), "0");
}
module.exports = {
toBuffer: function bigintToBuffer(number) {
assert(number >= 0n);
let hex = padHex(number.toString(16));
return Buffer.from(hex, "hex");
},
toBigint: function bufferToBigint(buffer) {
let hex = buffer.toString("hex");
return BigInt("0x" + hex);
}
};

@ -0,0 +1,12 @@
"use strict";
const assert = require("assert");
const log2 = require("@extra-bigint/log2");
module.exports = function countBits(number) {
assert(number >= 0n);
return (number === 0n)
? 1 // Special case, log2 would return null here
: Number(log2(number)) + 1;
};

@ -0,0 +1,9 @@
"use strict";
const countBits = require("./count-bits");
module.exports = function countBytes(number) {
let bits = countBits(number);
return Math.ceil(bits / 8);
};

@ -0,0 +1,5 @@
"use strict";
module.exports = function generateMask(bits) {
return (2n ** BigInt(bits)) - 1n;
};

@ -0,0 +1,7 @@
"use strict";
// FIXME: Replace with standardized version? May have performance impact since the value size is no longer hardcoded to 1 byte
module.exports = function invertBits1Byte(value) {
// Note that this is bit inversion for *unsigned* integers
return 255 - value;
};

@ -0,0 +1,11 @@
"use strict";
const countBytes = require("./count-bytes");
const generateMask = require("./generate-mask");
module.exports = function bitwiseInvert(number) {
let bytes = countBytes(number);
let mask = generateMask(bytes * 8);
return number ^ mask;
};

@ -0,0 +1,8 @@
"use strict";
module.exports = function truncateLeftBits(value, shiftBits) {
return ((value
<< shiftBits)
& 255) // Only retain right-most 8 bits
>>> shiftBits;
};

@ -0,0 +1,25 @@
# coders
Every coder has two different encoding modes, though they may share the same implementation:
1. Index key mode, which produces a binary-sortable representation for usage in index building
2. Value mode, which produces a reversible representation for actual data storage
Both modes should result in a Buffer and optionally an auxiliary blob (for out-of-band blob storage).
## Index key mode
Technical priorities and requirements:
- Must be binary-sortable; that is, upon sorting a list of encoded representations, its order *must* match that of the original corresponding inputs if those were to be lexicographically sorted (according to the sorting rules for their data type).
- Must be deterministic; the same input value must result in the same encoded representation every time. If there is a controllable form of non-determinism (eg. a versioned set of sorting rules such as DUCET), it must be possible to regenerate the index keys for all existing values with the new version of the encoding.
- Must be space-efficient.
- Prioritize encoding speed over other (non-required) characteristics such as reversibility.
## Value mode
Technical priorities and requirements:
- Must be reversible; ie. it must be possible to *losslessly* decode the encoded representation back into its original value.
- Must be space-efficient.
- Prioritize *decoding* speed over encoding speed (within reasonable bounds) as well as other (non-required) characteristics such as binary-sortability.

@ -0,0 +1,33 @@
"use strict";
const BOOLEAN_FALSE = Buffer.from([ 0 ]);
const BOOLEAN_TRUE = Buffer.from([ 1 ]);
module.exports = {
encode: function (value, _asIndexKey) {
return {
value: (value === true)
? BOOLEAN_TRUE
: BOOLEAN_FALSE,
auxiliaryBlob: undefined
};
},
decode: function (buffer, offset) {
let value = buffer.readUInt8(offset);
let booleanValue;
if (value === 0) {
booleanValue = false;
} else if (value === 1) {
booleanValue = true;
} else {
throw new Error(`Invalid byte value while decoding boolean: ${value}`);
}
return {
bytesRead: 1,
value: booleanValue,
auxiliaryBlob: undefined
};
}
};

@ -0,0 +1,64 @@
"use strict";
const generateID = require("../../generate-id");
const BLOB_INTERNAL = Buffer.from([ 0 ]);
const BLOB_EXTERNAL = Buffer.from([ 1 ]);
module.exports = {
encode: function (value, _asIndexKey) {
if (value.length < 256) {
return {
value: [
BLOB_INTERNAL,
Buffer.from([ value.length ]),
value
],
auxiliaryBlob: undefined
};
} else {
let blobID = generateID();
return {
value: [
BLOB_EXTERNAL,
blobID
],
auxiliaryBlob: {
key: blobID,
value: value
}
};
}
},
decode: function (buffer, offset) {
let blobType = buffer.readUInt8(offset);
if (blobType === 0) {
// Internal blob
let blobLength = buffer.readUInt8(offset + 1);
let blobPosition = offset + 2;
let bytes = buffer.slice(blobPosition, blobPosition + blobLength);
return {
bytesRead: 2 + blobLength,
value: bytes,
auxiliaryBlob: undefined
};
} else if (blobType === 1) {
// External blob
let idPosition = offset + 1;
return {
bytesRead: 13, // a blob ID is always 12 bytes
value: undefined,
auxiliaryBlob: {
key: buffer.slice(idPosition, idPosition + 12),
transform: (blob) => blob
}
};
} else {
throw new Error(`Invalid blob type: ${blobType}`);
}
}
};

@ -0,0 +1,95 @@
"use strict";
const createBitmaskArithmeticCoder = require("../../packages/arithmetic-coder-bitmask");
const bigintBuffer = require("../bigint/buffer");
module.exports = function createDateEncoder(withTimezone) {
// 8.25 bytes with timezone, 7 bytes without
// NOTE: Maximums are exclusive!
let baseFields = [
{ name: "millisecond", minimum: 0, maximum: 1000 },
{ name: "second", minimum: 0, maximum: 64 }, // No, this is not a typo; leap seconds are a thing
{ name: "minute", minimum: 0, maximum: 60 },
{ name: "hour", minimum: 0, maximum: 25 },
{ name: "day", minimum: 1, maximum: 32 },
{ name: "month", minimum: 1, maximum: 13 },
{ name: "year", minimum: -5000, maximum: 10000 },
];
let extraFields = (withTimezone === true)
? [{ name: "timezone", minimum: 0, maximum: 1000 }]
: [];
// FIXME: Pre-pad with zeroes for correct sortability
// FIXME: Add back the buffer conversion here
let coder = createBitmaskArithmeticCoder([
... extraFields,
... baseFields
]);
// TODO: Assert that bits is not higher than max safe integer?
let coderBytes = Math.ceil(Number(coder.bits) / 8);
return {
encode: function (data) {
// console.log({ coderBytes, coderBits: coder.bits });
// console.log(bigintBuffer.toBuffer(coder.encode(data)));
return {
value: bigintBuffer.toBuffer(coder.encode(data)),
auxiliaryBlob: undefined
};
},
decode: function (buffer, offset) {
let encoded = buffer.slice(offset, offset + coderBytes);
return {
bytesRead: coderBytes,
value: coder.decode(bigintBuffer.toBigint(encoded)),
auxiliaryBlob: undefined
};
}
};
};
// console.log(Number(module.exports(true).bits) / 8);
// console.log(Number(module.exports(false).bits) / 8);
// let coder = module.exports(true);
// let coder_ = module.exports(false);
// console.log(coder.encode({
// year: 9999,
// month: 11,
// day: 31,
// hour: 24,
// minute: 59,
// second: 60,
// millisecond: 999,
// timezone: 499
// }));
// let encoded = coder.encode({
// year: 9999,
// month: 11,
// day: 23
// });
// let buffer = bigintToBuffer(encoded);
// let bigintAgain = bufferToBigint(buffer);
// console.log(coder.decode(bigintAgain));
// let buffer = Buffer.from(encoded.toString(16), "hex");
// console.log(encoded);
// console.log(buffer);
// console.log(BigInt("0x" + buffer.toString("hex")));
// let decoded = coder.decode(encoded);
// let decodedFromBuffer = coder.decode(BigInt("0x" + buffer.toString("hex")));
// console.log(decoded);
// console.log(decodedFromBuffer);

@ -0,0 +1,57 @@
"use strict";
const assert = require("assert");
const unreachable = require("@joepie91/unreachable")("zapdb-kv");
const integerCoder = require("./integer");
function decimalToInteger(value, precision = 0) {
let multiplier = 10n ** BigInt(precision);
let match = /^(-?)([0-9]+)(?:\.([0-9]+))?$/.exec(value);
if (match != null) {
let isNegative = (match[1] !== "");
let whole = BigInt(match[2]) * multiplier;
let fraction = (match[3] != null)
? BigInt(match[3].slice(0, precision))
: BigInt(0);
let signMultiplier = (isNegative)
? -1n
: 1n;
return (whole + fraction) * signMultiplier;
} else {
throw unreachable("Decimal regex did not match");
}
}
function integerToDecimal(integer, precision = 0) {
let multiplier = 10n ** BigInt(precision);
let wholes = integer / multiplier; // this is integer division!
let fraction = integer - (wholes * multiplier); // modulo
// TODO: Support float mode? Or maybe custom transforms on a database level, eg. to feed this through a user-specified decimal library?
return `${wholes}.${fraction}`;
}
module.exports = {
encode: function (value, asIndexKey, { precision }) {
let integer = decimalToInteger(value, precision);
return integerCoder.encode(integer, asIndexKey);
},
decode: function (buffer, offset, { precision }) {
let result = integerCoder.decode(buffer, offset);
assert(result.blobID === undefined);
let decimal = integerToDecimal(result.value, precision);
return {
bytesRead: result.bytesRead,
value: decimal,
auxiliaryBlob: undefined
};
}
};

@ -0,0 +1,95 @@
"use strict";
const assert = require("assert");
const bigintBuffer = require("../bigint-buffer");
const createArithmeticCoder = require("./bitmask-arithmetic");
module.exports = function createDurationEncoder() {
// 55 bits for value + 1 bit for sign = 56 bits = 7 bytes
// NOTE: Maximums are exclusive!
let coder = createArithmeticCoder([
{ name: "milliseconds", minimum: 0, maximum: 1000 },
{ name: "seconds", minimum: 0, maximum: 64 }, // No, this is not a typo; leap seconds are a thing
{ name: "minutes", minimum: 0, maximum: 60 },
{ name: "hours", minimum: 0, maximum: 25 },
{ name: "days", minimum: 0, maximum: 31 },
{ name: "months", minimum: 0, maximum: 12 },
{ name: "years", minimum: 0, maximum: 8000 },
]);
let zeroPoint = 2n ** coder.bits; // first bit is a 1
return {
bits: coder.bits + 1n,
encode: function (value) {
let { negative, ... rest } = value;
let number = coder.encode(rest);
// NOTE: This approach ensures that values are correctly sorted even in their buffer representation!
let signedNumber = (negative)
? zeroPoint - number
: zeroPoint + number;
return {
value: bigintBuffer.toBuffer(signedNumber),
auxiliaryBlob: undefined
};
},
decode: function (bytes) {
let signedNumber = bigintBuffer.toBigint(bytes);
let negative = (signedNumber < zeroPoint);
let number = (negative)
? zeroPoint - signedNumber
: signedNumber - zeroPoint;
// FIXME: Update to new decode API
let decoded = coder.decode(number);
return {
... decoded,
negative: negative
};
}
};
};
// let coder = module.exports();
// let data1 = {
// negative: true,
// seconds: 1n
// };
// let data2 = {
// negative: false,
// seconds: 1n
// };
// let encoded1 = coder.encode(data1);
// let encoded2 = coder.encode(data2);
// console.log({ encoded1, encoded2 });
// let decoded1 = coder.decode(encoded1);
// let decoded2 = coder.decode(encoded2);
// console.log({ decoded1, decoded2 });
// assert(Buffer.compare(encoded1, encoded2) === -1);
// assert.deepStrictEqual(data1, stripUndefined(decoded1));
// assert.deepStrictEqual(data2, stripUndefined(decoded2));
// function stripUndefined(object) {
// let newObject = {};
// for (let [ key, value ] of Object.entries(object)) {
// if (value !== undefined) {
// newObject[key] = value;
// }
// }
// return newObject;
// }

@ -0,0 +1,58 @@
"use strict";
const matchValue = require("match-value");
const unreachable = require("@joepie91/unreachable")("zapdb-kv");
const stringCoder = require("./string");
const bytesCoder = require("./bytes");
const booleanCoder = require("./boolean");
const integerCoder = require("./integer");
const decimalCoder = require("./decimal");
const createDateCoder = require("./date");
const dateCoder = createDateCoder(false);
const tzDateCoder = createDateCoder(true);
function $unimplemented() {
throw unreachable("Not implemented yet");
}
function getCoderForType(fieldType, withTimezone) {
return matchValue(fieldType, {
string: stringCoder,
bytes: bytesCoder,
boolean: booleanCoder,
integer: integerCoder,
decimal: decimalCoder,
date: (withTimezone)
? tzDateCoder
: dateCoder,
array: $unimplemented,
object: $unimplemented,
float: $unimplemented,
duration: $unimplemented,
_: () => unreachable(`Unrecognized field type '${fieldType}'`)
});
}
module.exports = {
encode: function (fieldType, value, options = {}) {
let { withTimezone, asIndexKey, ... coderOptions } = options;
// TODO: Handle key length constraints; this needs to be handled at a high level since *any* type should be assumed to potentially create a too-long key
// TODO: Figure out how to expose the auxiliaryBlob store to the decoders; maybe we shouldn't, and instead the decoder should just return a key which is handled higher-up?
if (value == null) {
// TODO: Test that this is handled correctly upstream
return null;
} else {
let coder = getCoderForType(fieldType, withTimezone);
return coder.encode(value, asIndexKey, coderOptions);
}
},
decode: function (fieldType, buffer, offset, options = {}) {
let { withTimezone, ... coderOptions } = options;
let coder = getCoderForType(fieldType, withTimezone);
return coder.decode(buffer, offset, coderOptions);
}
};

@ -0,0 +1,22 @@
"use strict";
const orderableVarint = require("../../packages/orderable-varint");
module.exports = {
encode: function (number, _asIndexKey) {
// TODO: Is there any reason to use a different encoding here for the actual value storage? Or is the key encoding already space-efficient enough?
return {
value: orderableVarint.encode(number),
auxiliaryBlob: undefined
};
},
decode: function (buffer, offset) {
let { value, bytesRead } = orderableVarint.decode(buffer.slice(offset));
return {
bytesRead: bytesRead,
value: value,
auxiliaryBlob: undefined
};
}
};

@ -0,0 +1,37 @@
"use strict";
const bytesCoder = require("./bytes");
module.exports = {
encode: function (value, asIndexKey) {
if (asIndexKey) {
throw new Error(`Not implemented yet`);
} else {
let binaryValue = Buffer.from(value, "utf8");
return bytesCoder.encode(binaryValue);
}
},
decode: function (buffer, offset) {
let result = bytesCoder.decode(buffer, offset);
if (result.auxiliaryBlob === undefined) {
// Internal blob
return {
bytesRead: result.bytesRead,
value: result.value.toString("utf8"),
auxiliaryBlob: undefined
};
} else {
// External blob
return {
bytesRead: result.bytesRead,
value: undefined,
auxiliaryBlob: {
key: result.auxiliaryBlob.key,
transform: (blob) => result.auxiliaryBlob.transform(blob).toString("utf8")
}
};
}
}
};

@ -0,0 +1,79 @@
"use strict";
const isDecimalStringNumber = require("./value-checks/string-number/is-decimal");
const isNegative = require("./value-checks/is-negative");
const fromDate = require("./type-adapters/from-date");
const fromTemporal = require("./type-adapters/from-temporal");
const getValueType = require("./get-value-type");
const coders = require("./coders");
// const forbiddenTypes = new Set([
// "function", "symbol"
// ]);
const allowedValueTypes = {
bytes: { bytes: true },
string: { string: true },
boolean: { boolean: true },
decimal: { string: true },
date: {
date: (value) => fromDate(value),
temporalTime: (value) => fromTemporal(value)
},
duration: {
temporalDuration: true // TODO: Convert to internal data structure
},
integer: {
integer: (value) => BigInt(value),
bigint: true,
// TODO: Allow string inputs here, eventually?
},
float: { float: true },
json: {
array: true,
object: true,
string: true,
integer: true,
float: true
}
};
// FIXME: Deal with index key length restrictions, as well as value blob length restrictions?
module.exports = function encodeField(options) {
let { name, value, fieldType, required, signed, ... coderOptions } = options;
let typeConverters = allowedValueTypes[fieldType];
let valueType = getValueType(value);
let isNumericType = (fieldType === "integer" || fieldType === "float" || fieldType === "decimal");
// FIXME: Error types
if (fieldType === "json" || fieldType === "float") {
throw new Error(`Property '${name}' is of the '${fieldType}' type, but support for this is not yet implemented`);
} else if (required === true && valueType === "null") {
throw new Error(`Property '${name}' is required`);
} else if (typeConverters[valueType] == null) {
let typeString = Object.keys(typeConverters).map((type) => `'${type}'`).join(", ");
throw new Error(`Value for property '${name}' should be one of types [${typeString}], but encountered '${valueType}' instead`);
} else {
let valueHandler = typeConverters[valueType];
let normalizedValue = (valueHandler === true)
? value
: valueHandler(value);
if (fieldType === "decimal" && !isDecimalStringNumber(normalizedValue)) {
throw new Error(`Value for property '${name}' must be a string decimal`);
} else if (typeof normalizedValue === "number" && Number.isNaN(normalizedValue)) {
throw new Error(`NaN is not an allowed value for property '${name}'`);
} else if (isNumericType && !signed && isNegative(normalizedValue)) {
// FIXME: Check number range as well?
throw new Error(`Value for property '${name}' is negative, but this is not allowed`);
} else {
return coders.encode(fieldType, normalizedValue, coderOptions);
}
}
};

@ -0,0 +1,45 @@
"use strict";
const isTemporalTime = require("./value-checks/temporal/is-compatible-time");
const isTemporalDuration = require("./value-checks/temporal/is-duration");
// NOTE: We use custom type-checking implementations here (instead of existing libraries) because we want the type names and checks to *exactly* match up with our internal storage rules, and third-party validation libraries typically don't provide such strong guarantees
module.exports = function getValueType(value) {
let primitiveType = typeof value;
switch (primitiveType) {
case "string":
case "boolean":
case "symbol":
case "function":
return primitiveType;
case "undefined":
return "null";
case "number":
if (Number.isInteger(value)) {
return "integer";
} else {
return "float";
}
case "bigint":
return "bigint";
default:
if (value == null) {
return "null";
} else if (Buffer.isBuffer(value)) {
// FIXME: Utf8Array and friends?
return "bytes";
} else if (Array.isArray(value)) {
return "array";
} else if (value instanceof Date) {
return "date";
} else if (isTemporalTime(value)) {
return "temporalTime";
} else if (isTemporalDuration(value)) {
// FIXME: Balancing is needed for these, prior to storage!
return "temporalDuration";
} else {
return "object";
}
}
};

@ -0,0 +1,100 @@
"use strict";
const assert = require("assert");
const timeCall = require("time-call");
const encodeField = require("./encode-field");
const coders = require("./coders");
// TODO: Test if codegen can speed this up, though note that dynamically-created functions execute in global scope *only*
// TODO: Enum
// FIXME: Default field values
// FIXME:regex type
module.exports = function createRecordObjectCoder(schema) {
return {
encode: function (object) {
let auxiliaryBlobs = new Map();
let buffers = schema.flatMap(({ name, type: fieldType, required, attributes }) => {
// options = { precision, withTimezone, signed, ... }
let value = object[name];
let result = encodeField({ asIndexKey: false, name, value, fieldType, required, ... attributes });
if (result.auxiliaryBlob !== undefined) {
auxiliaryBlobs.set(result.auxiliaryBlob.key, result.auxiliaryBlob.value);
}
return result.value;
});
return {
record: Buffer.concat(buffers),
auxiliaryBlobs: auxiliaryBlobs
};
},
decode: function (buffer) {
let offset = 0;
let resultObject = {};
schema.forEach(({ name, type: fieldType, attributes }) => {
let result = coders.decode(fieldType, buffer, offset, attributes);
if (result.auxiliaryBlob === undefined) {
resultObject[name] = result.value;
} else {
// FIXME
resultObject[name] = { _blobKey: result.auxiliaryBlob.key };
}
offset += result.bytesRead;
});
// Ensure that we've read exactly enough bytes
assert(offset === buffer.length);
return resultObject;
}
};
};
// let dummySchema = [
// { name: "username", type: "string", required: true },
// { name: "passwordHash", type: "string", required: true },
// { name: "emailAddress", type: "string" },
// { name: "isActive", type: "boolean", required: true },
// { name: "registrationDate", type: "date", required: true, attributes: { withTimezone: true } },
// { name: "invitesLeft", type: "integer", required: true },
// ];
// let recordCoder = module.exports(dummySchema);
// let input = {
// username: "joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91joepie91",
// passwordHash: "foobar",
// emailAddress: "admin@cryto.net",
// isActive: true,
// registrationDate: new Date,
// invitesLeft: 10
// };
// let encoded = recordCoder.encode(input);
// let decoded = recordCoder.decode(encoded.record);
// // 44 bytes encoded
// // 29 characters string
// // 15 bytes extra
// console.log({ input, encoded, decoded });
// // console.log(timeCall(() => {
// // for (let i = 0; i < 1000; i++) {
// // recordCoder.encode({
// // username: "joepie91",
// // passwordHash: "foobar",
// // emailAddress: "admin@cryto.net",
// // isActive: true,
// // registrationDate: new Date,
// // invitesLeft: 10
// // });
// // }
// // }));

@ -0,0 +1,9 @@
"use strict";
const { toTemporalInstant } = require("@js-temporal/polyfill");
const fromTemporal = require("./from-temporal");
module.exports = function fromDate(dateObject, withTimezone) {
return fromTemporal(toTemporalInstant.call(dateObject), withTimezone);
};

@ -0,0 +1,34 @@
"use strict";
const { Temporal } = require("@js-temporal/polyfill");
const asExpression = require("as-expression");
const unreachable = require("@joepie91/unreachable")("zapdb");
module.exports = function fromTemporal(object, withTimezone) {
let zoned = asExpression(() => {
if (object.constructor.name === "Instant") {
return object.toZonedDateTimeISO("UTC");
} else if (object.constructor.name === "ZonedDateTime") {
return object;
} else {
unreachable("Invalid Temporal type");
}
});
let normalized = (withTimezone)
? zoned
: zoned.withTimeZone("UTC");
return {
millisecond: normalized.millisecond,
second: normalized.second,
minute: normalized.minute,
hour: normalized.hour,
day: normalized.day,
month: normalized.month,
year: normalized.year,
timezone: (withTimezone)
? normalized.timeZone.toString()
: undefined
};
};

@ -0,0 +1,10 @@
"use strict";
const isNegativeStringNumber = require("./string-number/is-negative");
const isNegativeNumber = require("./number/is-negative");
module.exports = function isNegative(value) {
return (typeof value === "string")
? isNegativeStringNumber(value)
: isNegativeNumber(value);
};

@ -0,0 +1,9 @@
"use strict";
module.exports = function isNegativeNumber(value) {
if (typeof value === "bigint") {
return (value < 0n);
} else {
return (value < 0);
}
};

@ -0,0 +1,5 @@
"use strict";
module.exports = function isDecimalStringNumber(value) {
return /^-?[0-9]+(?:\.[0-9]+)?$/.test(value);
};

@ -0,0 +1,5 @@
"use strict";
module.exports = function isIntegerStringNumber(value) {
return /^-?[0-9]+$/.test(value);
};

@ -0,0 +1,5 @@
"use strict";
module.exports = function isNegativeStringNumber(value) {
return (value[0] === "-");
};

@ -0,0 +1,12 @@
"use strict";
module.exports = function isCompatibleTemporalTime(value) {
// FIXME: Improve identity check
return (
value?.constructor?.name != null
&& (
value.constructor.name === "Instant"
|| value.constructor.name === "ZonedDateTime"
)
);
};

@ -0,0 +1,13 @@
"use strict";
module.exports = function isTemporalDuration(value) {
// FIXME: Improve identity check
return (
value?.constructor?.name != null
&& value.constructor.name === "Duration"
&& value.blank != null
&& value.microseconds != null
&& typeof value.total === "function"
&& typeof value.round === "function"
);
};

@ -0,0 +1,7 @@
"use strict";
module.exports = function compose(funcs) {
return function (value) {
return funcs.reduce((last, func) => func(last), value);
};
};

@ -0,0 +1 @@
Ś˘evalueIzćťîćohţmauxiliaryBlob÷˘evalueIzćťîćohţmauxiliaryBlob÷˘evalueGr ů·˙mauxiliaryBlob÷˘evalueGr ů·˙mauxiliaryBlob÷˘evalueIzćťîćţmauxiliaryBlob÷˘evalueIzćťîćţmauxiliaryBlob÷˘evalueGr ů·˙mauxiliaryBlob÷˘evalueGr ů·˙mauxiliaryBlob÷˘evalueIzćťîćţmauxiliaryBlob÷˘evalueIzćťîćţmauxiliaryBlob÷˘evalueGr ů·˙mauxiliaryBlob÷˘evalueGr ů·˙mauxiliaryBlob÷

@ -0,0 +1,83 @@
"use strict";
const tape = require("tape");
const { Temporal } = require("@js-temporal/polyfill");
const path = require("path");
const seedRandom = require("seed-random");
const filledArray = require("fix-esm").require("filled-array").default;
require("../../src/packages/tape-snapshot").setup(tape, path.join(__dirname, "_snapshots"));
const coders = require("../../src/storage-encoder/coders");
const encodeField = require("../../src/storage-encoder/encode-field");
const namedCartesianProduct = require("../../src/packages/named-cartesian-product");
const zonedTemporal = Temporal.ZonedDateTime.from("2021-11-27T23:43:32.81701281+01:00[Europe/Amsterdam]");
const unzonedTemporal = Temporal.Instant.from("2021-11-27T22:43:32.821012817Z");
const jsDate = new Date("2021-11-27T22:43:32.821012817Z");
tape("date encoding", (test) => {
let encodingConfigurations = namedCartesianProduct({
value: [ zonedTemporal, unzonedTemporal, jsDate ],
withTimezone: [ true, false ],
asIndexKey: [ true, false ]
});
let encoded = encodingConfigurations.map(({ value, withTimezone, asIndexKey }) => {
// NOTE: This uses `doEncode` to ensure that the input gets normalized to the internal data representation first. This needs to eventually be changed to use `coders.encode` directly, with separate tests for normalization!
return encodeField({ value, withTimezone, asIndexKey, fieldType: "date", name: "_", required: true });
});
test.equalsSnapshot(encoded);
// FIXME: Decode values again
test.end();
});
tape("integer encoding", (test) => {
let generateNumber = seedRandom("zapdb-number-test");
let numberLists = [];
// TODO: Add tests for higher numbers using BigInts
for (let magnitude = 1; magnitude < 15; magnitude++) {
let maxNumber = Math.pow(10, magnitude);
numberLists.push(filledArray(() => Math.floor(generateNumber() * maxNumber), 30));
}
let positiveNumbers = numberLists.flat();
let negativeNumbers = positiveNumbers.map((number) => 0 - number);
let allNumbers = [
... positiveNumbers,
... negativeNumbers
];
let encodedNumbers = allNumbers.map((number) => {
return [
coders.encode("integer", number, { asIndexKey: true }),
coders.encode("integer", number, { asIndexKey: false }),
];
});
test.equalsSnapshot(encodedNumbers);
// FIXME: Decode values again
test.end();
});
tape("boolean encoding", (test) => {
test.equalsSnapshot([
coders.encode("boolean", true, { asIndexKey: true }),
coders.encode("boolean", true, { asIndexKey: false }),
coders.encode("boolean", false, { asIndexKey: true }),
coders.encode("boolean", false, { asIndexKey: false }),
]);
// FIXME: Decode values again
test.end();
});

@ -0,0 +1,69 @@
"use strict";
const { Temporal } = require("@js-temporal/polyfill");
const test = require("tape");
const fromTemporal = require("../../../src/storage-encoder/type-adapters/from-temporal");
const zonedTemporal = Temporal.ZonedDateTime.from("2021-11-27T23:43:32.81701281+01:00[Europe/Amsterdam]");
const unzonedTemporal = Temporal.Instant.from("2021-11-27T22:43:32.821012817Z");
test("unzoned Temporal -> unzoned", (test) => {
test.deepEqual(fromTemporal(unzonedTemporal, false), {
millisecond: 821,
second: 32,
minute: 43,
hour: 22,
day: 27,
month: 11,
year: 2021,
timezone: undefined
});
test.end();
});
test("unzoned Temporal -> zoned", (test) => {
test.deepEqual(fromTemporal(unzonedTemporal, true), {
millisecond: 821,
second: 32,
minute: 43,
hour: 22,
day: 27,
month: 11,
year: 2021,
timezone: 'UTC'
});
test.end();
});
test("zoned Temporal -> unzoned", (test) => {
test.deepEqual(fromTemporal(zonedTemporal, false), {
millisecond: 817,
second: 32,
minute: 43,
hour: 22,
day: 27,
month: 11,
year: 2021,
timezone: undefined
});
test.end();
});
test("zoned Temporal -> zoned", (test) => {
test.deepEqual(fromTemporal(zonedTemporal, true), {
millisecond: 817,
second: 32,
minute: 43,
hour: 23,
day: 27,
month: 11,
year: 2021,
timezone: 'Europe/Amsterdam'
});
test.end();
});

@ -0,0 +1,2 @@
[storage]
reserve-space = 0

File diff suppressed because it is too large Load Diff

@ -0,0 +1,127 @@
Roadmap:
- Write encoding and roundtrip tests for value encodings
- Write a schema for a testing table
- Figure out an initial version of the schema DSL
- Encode a record for that table
- Decode the encoded representation, and verify that it matches
- Write encoding and roundtrip tests for sample records
- Build a todo list app to design and test out the APIs in a real-world application
- Build zapforum to test out more complex real-world requirements
======
# Value encodings
Timestamp (UTC)
Value: fixed-length arithmetic
Index: fixed-length arithmetic
Timestamp (with timezone)
Value: fixed-length arithmetic (local time + timezone ID)
Index: fixed-length arithmetic (in UTC)
Integer
Value: orderable varint
Index: orderable varint
Decimal
Value: orderable varint (with fixed decimal point shift)
Index: orderable varint (with fixed decimal point shift)
Boolean
Value: 0 or 1
Index: 0 or 1
Binary
Value: as-is; inline (with length prefix), or blob ID reference
Index: as-is; inline (truncated)
String:
Value: UTF-8; inline (with length prefix), or blob ID reference
Index: Unicode Collation Algorithm (truncated)
JSON:
Value: CBOR
Index: (not indexable)
Record reference:
Value: TBD
Index: TBD
======
# Migrations as a first-class feature
Database migrations are a first-class and *required* feature in zapdb. Every table is defined as the sum of (ordered) database migrations that affect it. It is not possible to modify schemas out-of-band; if it's not defined in a migration, it doesn't exist. Making the database directly aware of migrations, and declaring them the exclusive way of defining schemas, has a number of important benefits; for example, it means that whole-table data conversions are not necessary because the database has 'historical' insight into old schema versions, which also means that the database can automatically infer rollback operations from migration operations.
To achieve this, every database internally stores a copy of all the migrations it knows about, with an index and an *exact* structural representation of that migration. If a conflicting migration definition is ever provided, the database will refuse to initialize, ensuring that the database definition in code *always* matches that which produced the existing data in the database.
It is possible for a database administrator to make the database 'forget' a certain migration; but doing so will involve a (potentially destructive) rollback of all data which was stored using the schema components defined in that migration, and it is only possible for the most recent migration at any time. This means that if the schema consists of 10 migrations, and you want to roll back migration 8, you must roll back migration 10 and 9 first.
It is generally more practical to 'undo' schema changes by defining them as a *new* migration instead, and so this 'forget' functionality is mainly expected to be used when eg. resolving merges in version control that specify conflicting database migrations, where different developers have a different local state in their database.
To make this first-class migration model possible, *all* migrations are required to be 'exhaustive'; that is, they must not only specify the full set of changes for that migration, but also the full set of operations for a rollback. Most of these rollback operations can be automatically inferred by the database by simply inverting the operation; but there are some cases where this is not possible, eg. when deleting a required field.
In that case, the migration author must specify how to reconstruct the field's contents if that field deletion were ever reverted; after all, it would be required again and so cannot be empty, but the original data was deleted when the migration was first applied.
A migration which is not exhaustive is invalid, and will produce an error, preventing the database from being initialized.
======
# Per-record schema versioning, and rolling migrations
One of the neat features that first-class migrations allow, is that of 'rolling migrations'. Whereas in traditional RDBMSes, a schema change necessitates immediately reconstructing the entire table, this is not necessary in zapdb. Instead, every record is prefixed with a schema index; an ID that specifies which 'revision' of the schema (ie. which migration) the record was encoded against. This allows records encoded against different versions of the schema to co-exist in one table.
When a record is retrieved that was encoded against a non-latest version of the schema, it is converted to the most recent schema on-the-fly after retrieval; this ensures that the application is always working with consistently-shaped data. The database may be configured to do one of two things:
1. Immediately write back the converted record in the new schema version, or
2. Only write it back once it is modified or otherwise stored, and so a write operation would occur anyway
Even though the database permanently has access to all revisions of the schema, due to migrations being a first-class feature, it is generally advisable not to keep 'old' records around for too long, as the on-retrieval conversion step requires some computation work, and this can compound as the record's schema revision becomes further removed from the latest (ie. target) schema revision. Therefore, the database offers a number of options:
1. After a migration occurs, start continuously converting records in the background whether they are accessed or not (this is a non-blocking process)
2. Do not automatically convert records, instead relying entirely on regular database access to surface old records (as described above)
The first option is likely to be the right one in a production environment, as it improves query times. The second option, however, may be useful in a development environment; as the developer might be doing schema rollbacks for one reason or another, and those *do* involve a blocking conversion of data to an older schema revision. It may then be preferable to keep values represented in an old schema revision as long as possible, to reduce the potential rollback time - records which were never converted to the newer schema revision, also don't need to be rolled back.
======
# Index formatting and querying
Keys should be generated based on the index key encoding for the value type. This encoding *must* be binary-sortable; that is, in binary (encoded) representation, the ordering must *exactly* match the desired ordering in structural (decoded) form. This necessitates that all value types can be represented in such a way; if not, they cannot be used to create an index.
Index keys *may* be truncated to a certain maximum length, to satisfy database limitations. In such cases, zapdb needs to recognize that the key length limit is hit, and do an *additional* iterative check on the results; as it is possible to get a false positive when the query and a record have a matching prefix the size of the length limit, but the full value mismatches.
This also needs to be taken into account for range queries; when the queried value meets or exceeds the key length limit, every match must separately be compared against the query value to ensure that it really *does* fit within the range.
The need for this additional check can (probably?) be statically determined from the query value during planning, so the additional iteration only needs to occur for known-ambiguous queries. This means that those queries will incur a small performance penalty (*especially* when there are many false positives, this needs to be documented!), but queries which fall *below* the key length limit are not affected and are still a straight index lookup.
The actual index lookups are done in one of two ways, depending on the type of clause:
- Exact match: do an index lookup for a specific key
- Comparison: do a range lookup for the first and/or last query value
======
# Handling unstable encodings
There are a number of value type encodings which are 'unstable', ie. the encoded representation can be different depending on *when* it is created, eg. due to the use of an external data source that receives updates. Currently known cases of this are:
- Timestamps with timezones; dependent on the tz database
- Strings; dependent on the DUCET dataset, used with the UCA (Unicode Collation Algorithm)
Unfortunately, this creates a complication for sorting in particular; if the sortable representation can change over time, that means that encoded keys generated with different versions of the dataset can *not* be compared against each other, as the dataset could have changed the sorting rules in some way. For example, DUCET swapped around the order of certain characters, or one timezone is now *before* another instead of *after* it.
Therefore, the database must internally maintain a record of which version of these external datasets was previously used to generate sortable representations; and whenever this external dataset changes through eg. a software update, the database is in an unstable state. It will need to regenerate all of the existing sorting keys before queries can safely be executed again. Two options can be provided to the database administrator for when this occurs:
1. Block the database; all sortable representations will be regenerated at once, and all queries are held back until this process has completed. No queries against the database can be carried out during this time, but correct query results are guaranteed at all times. This is also the fastest option, as the entire index can be thrown out at once and regenerated from scratch.
2. Regenerate sortable representations as a background task; queries to the database are permitted as normal, but any kind of range query on the affected fields *may* produce false positives or false negatives while the regeneration process is in progress. This maximizes database availability, at the cost of potentially wrong query results. It is also the slower option; instead of throwing out the entire index, the database will now need to selectively modify and/or delete index entries as individual records have been converted, which requires additional lookups.
In the future, it may be worth exploring whether it is possible to make these options configurable on a per-field level; such that eg. wrong results are allowed for *most* fields, but not for security-sensitive fields, to which queries are held back until the process completes. In such a setup, the database could prioritize converting the blocking fields, allowing some time between each such index conversion to allow newly-possible queries to complete.
Note that for the second option to work, a separate 'reverse index' needs to be maintained (which costs additional storage space), which maps records to their index key(s). As the dataset which was originally used to generate index keys may no longer be available, and so the old sortable representations cannot be reproduced anymore, this mapping is needed to determine which outdated index entries need to be invalidated when the sortable representation is regenerated (or whether it needs to be regenerated at all!). The first option does not require this, as it wipes out the entire index at once.
======
To spec out:
- Application-defined index types
Loading…
Cancel
Save