This design doc covers the basic design I'm trying to implement with
Isabella and is an intended plan for the initial work. I have a short
"roadmap" of things to do so that it's broken into smaller pieces.
---
docs/0001-initial-design-and-plan.md | 191 +++++++++++++++++++++++++++
1 file changed, 191 insertions(+)
create mode 100644 docs/0001-initial-design-and-plan.md
diff --git a/docs/0001-initial-design-and-plan.md b/docs/0001-initial-design-and-plan.md
new file mode 100644
index 0000000..d0038c5
--- /dev/null
+++ b/docs/0001-initial-design-and-plan.md
@@ -0,0 +1,191 @@
+
+This document is an initial design document. It's sparse in details in some
+places and is meant only to serve as exploration for initial development. It
+contains omissions and is light on details for things that are in a more
+exploratory phase.
+
+The main focus is ensuring that the overall direction is correct, as well as
+discovering things that will need to be learned or implemented.
+
+
+# Context
+
+IsabellaDB is a chess database. It's meant to be used for analyzing past games
+and patterns across them to uncover interesting facts or improve as a chess
+player.
+
+Some things which IsabellaDB will support:
+- Search for games
+ - Filter games on metadata (player, event, year, rating, etc.)
+ - Filter games based on position matches
+ - Filter games based on sequences of position matches
+- Display a game (show move sequence and allow going forward/back through it)
+- Opening tree
+ - Explore from the beginning of a game
+ - Find games which match a given position
+ - See win/lose/draw percents for a given position
+ - Apply a game search filter to the above
+- (TODO) Opening repertoires
+ - Discover a player's opening repertoire
+ - Build an opening repertoire
+- (TODO) Ingest new game data
+ - Update as new master game databases are published
+ - Pull in data for us amateurs (for openings especially)
+
+
+# Game Storage and Retrieval
+
+The fundamental functionality of a chess database is to be able to store and
+retrieve games with various filters. Additional functionality, like analysis,
+is layered on top of this. As such, game storage and retrieval is the critical
+core of IsabellaDB.
+
+We'll focus first on storage of games, then retrieval, then indexing.
+
+## Storage
+
+Each game has a set of metadata associated with it, as well as a list of moves.
+
+The metadata contains:
+- Player names
+- Player ratings
+- Game result
+- Game date
+- Event information (event name, site name, round)
+
+The game information contains:
+- Sequence of materialized positions representing the game
+- The algebraic notation representation of the game (for display)
+
+There will be a lot of repetition in metadata, so it will be converted to store
+only once in memory in a separate names store, and then the game records will
+point into that store.
+
+Each game will be assigned a unique id when it is ingested into the system,
+and these will be used as the lookup keys from the indexes.
+
+The initial in-memory storage will be pretty basic and will look like:
+```rust
+type GameID = u64;
+
+struct GameStore {
+ games: HashMap<GameID, Game>,
+ indexes: ...
+}
+```
+
+
+## Retrieval
+
+There are two types of filters that you can do: filtering based on metadata
+about a game, and filtering based on what happened within the game.
+
+Retrieval will be pretty straightforward. Given a list of filters, in order of
+use (this order will be determined by the query planner), we can retrieve them
+with this pseudocode:
+
+```python
+# get the games that match the first filter
+games = db.retrieve(filters[0])
+
+for filter in filters:
+ games = games.filter(filter)
+```
+
+To do the retrieval, the `db.retrieve` call will utilize the index that
+corresponds to that filter and retrieve the records that match it, either as
+a range or an exact result list.
+
+
+## Indexing
+
+To start, we'll have some fairly simple indexes. Over time, we'll build up more
+complicated indexes to handle more complicated search patterns.
+
+The initial indexes will be:
+- **Player name index**: How to implement this is TBD and may start as a prefix
+ tree, but ideally would be able to match on any substring within the
+ metadata, not just based on prefixes. Inverted indexes also seem relevant.
+- **Event name index**: Same as player name index.
+- **Rating index**: a B-tree index over ratings.
+- **Date index**: a B-tree index over dates.
+- **Position index**: We'll have a reverse lookup from position to the games which
+ contained that position. Lookups into this index can use hashes of the
+ positions for quick initial lookups, but will have to do an exact comparison
+ comparison of the positions to avoid collisions.
+
+Indexing on game results would not add any significant benefit, since there are
+only three classes (win, lose, draw).
+
+Eventually, we'll also have indexes based on features of positions, instead of
+exact position matches. This includes things like the presence (or absence) of
+pins, certain pieces being on the board or not, etc. Further development of
+indexes will be driven by need to support certain types of queries.
+
+Out of scope for the initial design:
+- Persistence. We'll reconstruct these at runtime for now to keep things simple.
+- Updates. We'll want to insert new games into the index at runtime eventually,
+ but for now we'll assume the dataset is static.
+
+
+# Querying
+
+A user should be able to submit a query for a set of games which match certain
+criteria. Here is an example of doing that with a few different filters on
+metadata:
+
+```
+# Find all the games played by Magnus and Fabi after 2015
+(search-games
+ (match-metadata
+ (either-name "Fabiano Caruana") # matches either player
+ (either-name "Magnus Carlsen") # matches either player
+ (year > 2015)
+ )
+)
+```
+
+And here's another example based on positional matching:
+
+```
+# Find games which match the provided position (5 ply of the Italian) with an
+# extra filter on metadata
+(search-games
+ (match-position
+ (fen "r1bqkbnr/pppp1ppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R b KQkq - 3 3")
+ )
+ (match-metadata
+ (white-rating > 2300)
+ (black-rating > 2300)
+ )
+)
+```
+
+The query language will be defined in more detail over time. This is just a
+sample to guide what capabilities it'll need to provide. This will be one of the
+later pieces developed.
+
+When a query is received, it will be parsed (as an S-expression), converted to
+a sequence of operations that can be run to retrieve the data from the store,
+and optimized (mostly around ordering of application of filters, and use of
+indexes).
+
+
+# Development Plan and Roadmap
+
+The initial development for the first phase will focus on getting data ingested
+and basic indexes constructed. Querying will be in the second phase.
+
+v0.1.0:
+- PGN parser to be able to read in the dataset I have available to me
+- Index on metadata (player names, event names, ratings, dates)
+- Index on positions
+- Search page with basic metadata filters
+
+v0.2.0:
+- Query parser
+- Opening tree page
+- Search page to also include position matches
+
+
+
--
2.37.3