Nicholas Tietz-Sokolsky: 1 Create design document for initial work 1 files changed, 191 insertions(+), 0 deletions(-)
Copy & paste the following snippet into your terminal to import this patchset into git:
curl -s https://lists.sr.ht/~ntietz/isabella-db-devel/patches/36367/mbox | git am -3Learn more about email & git
This design doc covers the basic design I'm trying to implement with Isabella and is an intended plan for the initial work. I have a short "roadmap" of things to do so that it's broken into smaller pieces. --- docs/0001-initial-design-and-plan.md | 191 +++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) create mode 100644 docs/0001-initial-design-and-plan.md diff --git a/docs/0001-initial-design-and-plan.md b/docs/0001-initial-design-and-plan.md new file mode 100644 index 0000000..d0038c5 --- /dev/null +++ b/docs/0001-initial-design-and-plan.md @@ -0,0 +1,191 @@ + +This document is an initial design document. It's sparse in details in some +places and is meant only to serve as exploration for initial development. It +contains omissions and is light on details for things that are in a more +exploratory phase. + +The main focus is ensuring that the overall direction is correct, as well as +discovering things that will need to be learned or implemented. + + +# Context + +IsabellaDB is a chess database. It's meant to be used for analyzing past games +and patterns across them to uncover interesting facts or improve as a chess +player. + +Some things which IsabellaDB will support: +- Search for games + - Filter games on metadata (player, event, year, rating, etc.) + - Filter games based on position matches + - Filter games based on sequences of position matches +- Display a game (show move sequence and allow going forward/back through it) +- Opening tree + - Explore from the beginning of a game + - Find games which match a given position + - See win/lose/draw percents for a given position + - Apply a game search filter to the above +- (TODO) Opening repertoires + - Discover a player's opening repertoire + - Build an opening repertoire +- (TODO) Ingest new game data + - Update as new master game databases are published + - Pull in data for us amateurs (for openings especially) + + +# Game Storage and Retrieval + +The fundamental functionality of a chess database is to be able to store and +retrieve games with various filters. Additional functionality, like analysis, +is layered on top of this. As such, game storage and retrieval is the critical +core of IsabellaDB. + +We'll focus first on storage of games, then retrieval, then indexing. + +## Storage + +Each game has a set of metadata associated with it, as well as a list of moves. + +The metadata contains: +- Player names +- Player ratings +- Game result +- Game date +- Event information (event name, site name, round) + +The game information contains: +- Sequence of materialized positions representing the game +- The algebraic notation representation of the game (for display) + +There will be a lot of repetition in metadata, so it will be converted to store +only once in memory in a separate names store, and then the game records will +point into that store. + +Each game will be assigned a unique id when it is ingested into the system, +and these will be used as the lookup keys from the indexes. + +The initial in-memory storage will be pretty basic and will look like: +```rust +type GameID = u64; + +struct GameStore { + games: HashMap<GameID, Game>, + indexes: ... +} +``` + + +## Retrieval + +There are two types of filters that you can do: filtering based on metadata +about a game, and filtering based on what happened within the game. + +Retrieval will be pretty straightforward. Given a list of filters, in order of +use (this order will be determined by the query planner), we can retrieve them +with this pseudocode: + +```python +# get the games that match the first filter +games = db.retrieve(filters[0]) + +for filter in filters: + games = games.filter(filter) +``` +
The first filter is done twice. This should say `for filter in filters[1:`.
+To do the retrieval, the `db.retrieve` call will utilize the index that +corresponds to that filter and retrieve the records that match it, either as +a range or an exact result list. + + +## Indexing + +To start, we'll have some fairly simple indexes. Over time, we'll build up more +complicated indexes to handle more complicated search patterns. + +The initial indexes will be: +- **Player name index**: How to implement this is TBD and may start as a prefix + tree, but ideally would be able to match on any substring within the + metadata, not just based on prefixes. Inverted indexes also seem relevant. +- **Event name index**: Same as player name index. +- **Rating index**: a B-tree index over ratings. +- **Date index**: a B-tree index over dates. +- **Position index**: We'll have a reverse lookup from position to the games which + contained that position. Lookups into this index can use hashes of the + positions for quick initial lookups, but will have to do an exact comparison + comparison of the positions to avoid collisions. + +Indexing on game results would not add any significant benefit, since there are +only three classes (win, lose, draw). + +Eventually, we'll also have indexes based on features of positions, instead of +exact position matches. This includes things like the presence (or absence) of +pins, certain pieces being on the board or not, etc. Further development of +indexes will be driven by need to support certain types of queries. + +Out of scope for the initial design: +- Persistence. We'll reconstruct these at runtime for now to keep things simple. +- Updates. We'll want to insert new games into the index at runtime eventually, + but for now we'll assume the dataset is static. + + +# Querying + +A user should be able to submit a query for a set of games which match certain +criteria. Here is an example of doing that with a few different filters on +metadata: + +``` +# Find all the games played by Magnus and Fabi after 2015 +(search-games + (match-metadata + (either-name "Fabiano Caruana") # matches either player + (either-name "Magnus Carlsen") # matches either player + (year > 2015) + ) +) +``` + +And here's another example based on positional matching: + +``` +# Find games which match the provided position (5 ply of the Italian) with an +# extra filter on metadata +(search-games + (match-position + (fen "r1bqkbnr/pppp1ppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3") + ) + (match-metadata + (white-rating > 2300) + (black-rating > 2300) + ) +) +``` + +The query language will be defined in more detail over time. This is just a +sample to guide what capabilities it'll need to provide. This will be one of the +later pieces developed. + +When a query is received, it will be parsed (as an S-expression), converted to +a sequence of operations that can be run to retrieve the data from the store, +and optimized (mostly around ordering of application of filters, and use of +indexes). + + +# Development Plan and Roadmap + +The initial development for the first phase will focus on getting data ingested +and basic indexes constructed. Querying will be in the second phase. + +v0.1.0: +- PGN parser to be able to read in the dataset I have available to me +- Index on metadata (player names, event names, ratings, dates) +- Index on positions +- Search page with basic metadata filters + +v0.2.0: +- Query parser +- Opening tree page +- Search page to also include position matches + + + -- 2.37.3
I got some informal feedback from a friend on this and intend to merge it today.