~jonsterling/forester-devel

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
1

[PATCH] initial draft of transplant command

Details
Message ID
<20241020201623.1413973-3-root@owenlynch.org>
DKIM signature
pass
Download raw message
Patch: +43 -4
(I apologize for the repeat email; I failed at correct email formatting before)

Hi all,

This is a small patch based on forester 4.3.1 which I imagine is hopelessly out
of date (especially with the datalog changes!), but I wanted to get some
feedback on the general idea, and I can re-implement it once the datalog version
stabilizes.

This adds a "forester transplant" command, with syntax "forester transplant
FOREST_TOML_FILE TAG OUTPUT_FILE". It outputs all of paths to the source code of
trees that either contain the tag TAG or are recursively transcluded in some
tree that contain the tag TAG.

This is already in use at Topos to create the topos public forest:
https://forest.topos.site/public (not much there right now!). In CI, I build the
forest as normal, and then I transplant the public tag, and build just that by
itself.

Right now transplant does not follow links. I think that this is perhaps the
right behavior, because one would not want to accidentally link to something and
then drag along your entire private notes. Rather, if you link to something and
you want it to come along, you should add the right tag to it.

It seems to me that this functionality (exporting only a subset of a forest) is
perhaps orthogonal to the cross-forest transclusions? Because it's probably
easier to have one big repo with all of your stuff, a portion of which you
publish, than it is to have a separate public and private forest where the
private forest can include some of the public. More generally, one might want to
expose different subsets of their forest to different collaborators. So there
might be room for this feature even after cross-forest transclusions get worked
out.

What do you think?
-Owen

---
 Dockerfile           |  2 +-
 bin/forester/main.ml | 45 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index c1d7917..fa0cc3c 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
FROM ocaml/opam:alpine-ocaml-5.1-flambda AS forester-builder
FROM ocaml/opam:alpine-ocaml-5.2-flambda AS forester-builder

RUN sudo apk update

diff --git a/bin/forester/main.ml b/bin/forester/main.ml
index 0d96043..e94359a 100644
--- a/bin/forester/main.ml
@@ -110,7 +110,7 @@ let query_tag ~env filter_tags config_filename =
  let tags = Forest.tags ~forest in
  let filtered =
    tags |> Seq.filter @@ fun (_, ts) ->
    List.for_all (fun t -> List.mem t ts) ts
    List.for_all (fun t -> List.mem t filter_tags) ts
  in
  filtered
  |> Seq.iter @@ fun (addr, tags) ->
@@ -132,6 +132,34 @@ let query_all ~env config_filename =
  |> Yojson.Basic.to_string
  |> Format.printf "%s"

module String_set = Set.Make (String)

let transplant ~env tag output config_filename =
  let config = Forester_frontend.Config.parse_forest_config_file config_filename in
  let forest =
    Forest.plant_forest @@
    Process.read_trees_in_dirs ~dev:true ~ignore_malformed:true @@
    make_dirs ~env config.trees
  in
  let query_lnvar = Query.(
      union_fam_rel
        (rel Edges Incoming Rel.tags (Addr (User_addr tag)))
        Paths
        Outgoing
        Rel.transclusion) in
  let query = Query.distill_expr query_lnvar in
  let results = forest.run_query query in
  let get_source addr = Option.bind
                          (Addr_map.find_opt addr forest.trees)
                          (fun tree -> tree.fm.source_path) in
  let sources = results
                |> Addr_set.to_list
                |> List.filter_map get_source
                |> String_set.of_list
  in
  let oc = open_out output in
  sources |> String_set.iter (fun path -> Printf.fprintf oc "%s\n" path);
  close_out oc

let init ~env () =
  let default_theme_url =
@@ -313,7 +341,7 @@ let query_taxon_cmd ~env =

let query_tag_cmd ~env =
  let arg_tags =
    Arg.(value @@ pos_all string [] @@ info [] ~docv:"TAG")
    Arg.(value @@ pos_right 0 string [] @@ info [] ~docv:"TAG")
  in
  let doc = "List all trees with tag TAG" in
  let info = Cmd.info "tag" ~version ~doc in
@@ -329,6 +357,17 @@ let query_cmd ~env =
  let info = Cmd.info "query" ~version ~doc in
  Cmd.group info [query_taxon_cmd ~env; query_tag_cmd ~env; query_all_cmd ~env]

let transplant_cmd ~env =
  let tag =
    Arg.(value @@ pos 1 string "" @@ info [] ~docv:"TAG")
  in
  let output =
    Arg.(value @@ pos 2 string "" @@ info [] ~docv:"OUTPUT")
  in
  let doc = "Get a list of trees to transplant" in
  let info = Cmd.info "transplant" ~version ~doc in
  Cmd.v info Term.(const (transplant ~env) $ tag $ output $ arg_config)

let init_cmd ~env =
  let doc = "Initialize a new forest" in
  let man = [
@@ -350,7 +389,7 @@ let cmd ~env =
  in

  let info = Cmd.info "forester" ~version ~doc ~man in
  Cmd.group info [build_cmd ~env; new_tree_cmd ~env; complete_cmd ~env; query_cmd ~env; init_cmd ~env;]
  Cmd.group info [build_cmd ~env; new_tree_cmd ~env; complete_cmd ~env; query_cmd ~env; transplant_cmd ~env; init_cmd ~env;]


let () =
-- 
2.46.1
Details
Message ID
<F8156D20-88C3-477D-8368-31076A451DF7@jonmsterling.com>
In-Reply-To
<20241020201623.1413973-3-root@owenlynch.org> (view parent)
DKIM signature
pass
Download raw message
On 20 Oct 2024, at 21:16, Owen Lynch wrote:

> (I apologize for the repeat email; I failed at correct email 
> formatting before)
>
> Hi all,
>
> This is a small patch based on forester 4.3.1 which I imagine is 
> hopelessly out
> of date (especially with the datalog changes!), but I wanted to get 
> some
> feedback on the general idea, and I can re-implement it once the 
> datalog version
> stabilizes.
>
> This adds a "forester transplant" command, with syntax "forester 
> transplant
> FOREST_TOML_FILE TAG OUTPUT_FILE". It outputs all of paths to the 
> source code of
> trees that either contain the tag TAG or are recursively transcluded 
> in some
> tree that contain the tag TAG.
>
> This is already in use at Topos to create the topos public forest:
> https://forest.topos.site/public (not much there right now!). In CI, I 
> build the
> forest as normal, and then I transplant the public tag, and build just 
> that by
> itself.
>
> Right now transplant does not follow links. I think that this is 
> perhaps the
> right behavior, because one would not want to accidentally link to 
> something and
> then drag along your entire private notes. Rather, if you link to 
> something and
> you want it to come along, you should add the right tag to it.
>
> It seems to me that this functionality (exporting only a subset of a 
> forest) is
> perhaps orthogonal to the cross-forest transclusions? Because it's 
> probably
> easier to have one big repo with all of your stuff, a portion of which 
> you
> publish, than it is to have a separate public and private forest where 
> the
> private forest can include some of the public. More generally, one 
> might want to
> expose different subsets of their forest to different collaborators. 
> So there
> might be room for this feature even after cross-forest transclusions 
> get worked
> out.

Hi Owen,

This looks cool. I am glad that you have found a solution that works 
"for now" until we have figured out how we want to handle this use-case 
long-term. I will have a look at this and see if we can (1) apply the 
patch to Forester 4.X, and (2) incorporate something like this 
functionality in Forester 5.X. (If we do apply to Forester 4.X, I might 
leave this undocumented as a feature that is meant to be used by you and 
Topos temporarily until we get our act together — see below.)

Regarding the 5.X version (within the "middle-brow" approach that we 
discussed in person), here is how I was envisioning this feature would 
interact with forest federation. I think that ultimately one will indeed 
want to have one big personal forest, some of which you want to expose 
to the outer world. The latter subforest should be distinguished in some 
way (perhaps by a tag, as you do here) and then the commands that 
"publish" a forest IR blob should be instrumentable by a datalog query. 
Then the blob is emitted and can be served from whatever location you 
wish, and mounted in (e.g.) an organisation's forest. Besides the blob 
format, the main difference between that vision and what you've done 
here is that the emitted trees will still be located under the "host" of 
the original forest they came from — a distinction that doesn't exist 
yet in Forester 4.X.

As it happens, one reason I had resisted for so long the idea of 
building a public forest from the same repository as the enclosing 
private forest is that I wanted the source code for my public forest to 
be available openly so that it can be examined by users as an example of 
forestry. But with Forester 5.X, I intend to create a fully public 
"Forester" forest that will contain the (yet unwritten) reference manual 
and release notes, which will fulfil the same goals — and so the 
source code for my public forest will likely become fully private in the 
coming months.

Best,
Jon
Reply to thread Export thread (mbox)