~mieum/booksin.space

20 4

The brick and mortar

Details
Message ID
<C6OBZU83WH8M.3D642UW63SKRT@maclibre>
DKIM signature
pass
Download raw message
I'm continuing the conversation from the previous thread here, for
"organization's" sake.

>I think we can set up a sr.ht CI service to scan changed text
>frontmatter (including frontmatter for new texts) and then to send that
>to the main booksin.space server database (which can then update
>relevant pages).  That would combine both worlds pretty neatly, but I
>don't know if there are any security implications to consider.  Also,
>the CI would have to be disabled/changed for mailing list submissions,
>so that no one can affect the server by posting patches on the mailing
>list.

This is an intresting idea, I didn't consider this. What sorts of
security implications do you anticipate other than unsolicited mailing
list patches? 

>P.S: I don't know if it showed up, but I had attached my updates to
>Nathan's text in my previous message.

I did see that, sorry I didn't get a chance to comment. When I get home
I will add it to the repo and server...so there's something there!

I'll have a little more time to set things up this evening. Thanks for
being patient everyone. And please, don't hesitate to make suggestions!
I may have brought the beer, but that doesn't make me in charge of the
party :)

~mieum
Details
Message ID
<C6OCE0MWNBQQ.1DSSB26KROSXP@gux>
In-Reply-To
<C6OBZU83WH8M.3D642UW63SKRT@maclibre> (view parent)
DKIM signature
missing
Download raw message
On Wed Oct 28, 2020 at 9:04 AM UTC, mieum wrote:
> > I think we can set up a sr.ht CI service to scan changed text
> > frontmatter (including frontmatter for new texts) and then to send
> > that to the main booksin.space server database (which can then
> > update relevant pages).  That would combine both worlds pretty
> > neatly, but I don't know if there are any security implications to
> > consider.  Also, the CI would have to be disabled/changed for
> > mailing list submissions, so that no one can affect the server by
> > posting patches on the mailing list.
>
> This is an interesting idea, I didn't consider this. What sorts of
> security implications do you anticipate other than unsolicited mailing
> list patches?

Well, the CI service manifest would probably be included in the Git repo
itself.  I assume that the CI service would have to connect to the
booksin.space server over SSH (which works because sr.ht allows us to
store an SSH secret key).  This would mean that anyone can read the
manifest to know where and what port SSH is plugged in to, so I hope you
have a good firewall set up or otherwise you risk getting spammed.

Also, I'll have to read up on the secrets system used by the CI - it may
be that we could allow the CI to run on mailing list patches because
those don't have access to the secret SSH key (so they can't affect the
server).

Also, someone may have to write the code to do all of this.  I probably
don't have the time to do it, but I can definitely provide an overview
of how everything should work and may occasionally be able to contribute
some code.

> > P.S: I don't know if it showed up, but I had attached my updates to
> > Nathan's text in my previous message.
>
> I did see that, sorry I didn't get a chance to comment. When I get
> home I will add it to the repo and server...so there's something
> there!

No worries.

~aravk
Details
Message ID
<C6OMKTWO6BAJ.34O8TCHOJHOU@taiga>
In-Reply-To
<C6OCE0MWNBQQ.1DSSB26KROSXP@gux> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
If you use build secrets to provide, for example, an SSH key, it will
not be included in any builds initiated from the mailing list.
Details
Message ID
<C6PEFZL3N91O.JBNHROMFRJIQ@maclibre>
In-Reply-To
<C6OMKTWO6BAJ.34O8TCHOJHOU@taiga> (view parent)
DKIM signature
pass
Download raw message
Sorry for so much delay in my replies, everyone. Our son is getting some
new teeth.....which has made for a tiring week >_<

>Also, someone may have to write the code to do all of this.  I probably
>don't have the time to do it, but I can definitely provide an overview
>of how everything should work and may occasionally be able to contribute
>some code.

Seems like it might be something worth putting together once the project
has a little more traction. In the meantime it shouldn't be an issue to
just pull and build changes by hand. This could be streamlined by using
a webhook to ping the server when there are updates.

I spent a little time today handling some material from Project
Gutenberg and thinking about how to approach curating a library from
material largely sourced from there.

1. It is safe to remove the license clause from Project Gutenberg books.

2. It is "easier" to gemtextify books in html form rather than plain
   text. The formatting of the plain text files at PG is ambiguous, and
   would be difficult to parse reliably. Using an existing html to gmi 
   converter, the majority of the work is done. Dashes, pre-formatted
   text, headings, etc. What is left to be done by hand may actually be
   eliminated altogether (such as the half- or full-title pages
   sometimes found at the beginning of a text).

   One benefit of parsing the html versions of these texts is there is a
   table of contents class that can be used to a) generate a table of
   contents, b) and facilitate parsing out chapters to be rendered as
   individual pages if desired (in the case of long texts, for example).

3. I'm still not sure about the best way to catalog. What metadata would
   be helpful to keep track of and how? I don't think booksin.space 
   should strive to be "archival," but some bibliographic data would be
   useful to build and (re)structure the site. I think a basic yaml block
   would suffice with just a few fields like title and author, (which can
   be extracted from the source file), genre, language,

Anyway, I am very sleepy, so I need to get some rest, but I wanted to
post this here for everyone. Tomorrow evening I will actually have some
time to sit down and put a little more into this. But for now, I sleep!

~mieum
Details
Message ID
<C6PFWOLA86PR.369ZY2I5TWX05@taiga>
In-Reply-To
<C6PEFZL3N91O.JBNHROMFRJIQ@maclibre> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Thu Oct 29, 2020 at 9:12 AM EDT, mieum wrote:
> 3. I'm still not sure about the best way to catalog. What metadata would
> be helpful to keep track of and how? I don't think booksin.space
> should strive to be "archival," but some bibliographic data would be
> useful to build and (re)structure the site. I think a basic yaml block
> would suffice with just a few fields like title and author, (which can
> be extracted from the source file), genre, language,

- Title
- Author
- Genre
- Language

Also:

- Date of publication
- Date of addition to catalogue
- Publisher
- Book series, if a member of one
- ISBN
- Dewey Decimal Number
Details
Message ID
<C6PFZA1CGZ4Z.T7F27R1JB2G1@maclibre>
In-Reply-To
<C6PFWOLA86PR.369ZY2I5TWX05@taiga> (view parent)
DKIM signature
pass
Download raw message
>- Date of publication
>- Date of addition to catalogue
>- Publisher
>- Book series, if a member of one
>- ISBN
>- Dewey Decimal Number

The only problem here is that, in the case of PG, this information is
not provided. The metadata they do provide is specific to their
representations of the original works. Because their contents are all
public domain works, in some cases their versions are not clearly
representations of any particular edition or source of the original.
In this case, their versions are technically derivative works (and the
ISBN, for example, is irrelevant or never existed).

Of course, not all content will come from Project Gutenberg, but finding
a source that includes all that metadata about the version of the
original from which it derives (and which is easily gemtextified)  may 
be an unneccessary hurdle. It is definitely not impossible, but it would 
add a certain degree of complexity to the project, for sure! I guess it
depends on how far down the rabbit hole we want to go haha

~mieum
Details
Message ID
<C6PGI8BDJBA9.13HJW6Z0WED1J@taiga>
In-Reply-To
<C6PFZA1CGZ4Z.T7F27R1JB2G1@maclibre> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
It would be nice to at least support these fields and leave it at the
librarian's discretion to put the work in to source the information.
Details
Message ID
<C6PGJ59IKP9E.35WDNM188VHZH@maclibre>
In-Reply-To
<C6PGI8BDJBA9.13HJW6Z0WED1J@taiga> (view parent)
DKIM signature
pass
Download raw message
That's a good point. I have a bona fide librarian friend I've asked to
join the project...I think his wisdom and resources would be valuable
here.
Details
Message ID
<C6PGRDVDCIYH.22XCM7CNMUV1B@callum-computer>
In-Reply-To
<C6PFZA1CGZ4Z.T7F27R1JB2G1@maclibre> (view parent)
DKIM signature
pass
Download raw message
Hi. This project sounds really cool! I'd like to contribute a few
thoughts.

Should there be a standard format the books should be in (e.g. contents
at the start, ## for chapter headings), or would anything be alright as
long as it's Gemtext? I think it would be nice to have some consistency,
but books can be very varied in terms of layout so it might not be
possible. Especially with non-fiction which might have references,
images etc.

I like the idea of the books having pages, but for simplicity's sake
that might be something best left to the user/client.

I think translations of the same book should all be linked, so that
you could look at a list of books by a certain author and see all the
translated versions. On a related note there's probably going to have to
be some way of managing different language versions of the booksin.space
interface itself.

With respect to metadata, how should multiple values be dealt with?
For example if there are multiple authors or genres.


I'm looking forward to a Gemini library!

Callum
Details
Message ID
<C6PHXXPYLMI1.1Z91Y81JTHKFS@gux>
In-Reply-To
<C6PGRDVDCIYH.22XCM7CNMUV1B@callum-computer> (view parent)
DKIM signature
missing
Download raw message
On Thu Oct 29, 2020 at 5:01 PM UTC, Callum Brown wrote:
> Should there be a standard format the books should be in (e.g. contents
> at the start, ## for chapter headings), or would anything be alright as
> long as it's Gemtext? I think it would be nice to have some consistency,
> but books can be very varied in terms of layout so it might not be
> possible. Especially with non-fiction which might have references,
> images etc.

I think it's best to leave space for some variation.  I suppose images
and stuff will have to be packaged with the books, and the books will
contain (relative) links to those images at the appropriate places, with
captions.

I'm imagining that a book lives at the URL
booksin.space/<category>/<book short name>/<lang>.gmi, with images (and
any other assets, such as original versions like PDFs) in the same
directory, and with booksin.space/<category>/<book short name>.tar for
an archive of the entire book.  Then metadata could be stored separately
at booksin.space/<category>/<book short name>/metadata.yml.

> I like the idea of the books having pages, but for simplicity's sake
> that might be something best left to the user/client.

I agree, although I think some sort of marker for pages and lines
(useful for plays) may be needed.  Also, books often have multiple
editions, so that will also have to be taken care of somehow (probably
by making <lang>.gmi into <edition>.<lang>.gmi, and namespacing other
assets similarly).

> I think translations of the same book should all be linked, so that
> you could look at a list of books by a certain author and see all the
> translated versions. On a related note there's probably going to have
> to be some way of managing different language versions of the
> booksin.space interface itself.

I think my above system would provide that ability.  I assume that the
page at /<category>/<book short name> would list different translations
and assets and give information from the metadata file.

> With respect to metadata, how should multiple values be dealt with?
> For example if there are multiple authors or genres.

I assume if we're using YAML that we can just use lists for referring to
some values for authors and genres and other fields.  The parser will
have to take that into account.

~aravk | ~nothien
Details
Message ID
<C6Q15NBLOOC9.1YGJLIY62ZXX5@maclibre>
In-Reply-To
<C6PHXXPYLMI1.1Z91Y81JTHKFS@gux> (view parent)
DKIM signature
pass
Download raw message
>I think it's best to leave space for some variation.

I agree. Rather than trying to confrom all books to an arbitrary format,
I think a reasonable consistency is enough. Some books will have nth
level of headings beyond the three available in Gemtext. One way to
provide context would be to number headings somehow (1., 1.1., 1.2,
etc.) But that may only be necessary where that hierarchy is crucial.
I think whatever makes sense for the particular text is fine.

>I'm imagining that a book lives at the URL
>booksin.space/<category>/<book short name>/<lang>.gmi, with images (and
>any other assets, such as original versions like PDFs) in the same
>directory, and with booksin.space/<category>/<book short name>.tar for
>an archive of the entire book.  Then metadata could be stored separately
>at booksin.space/<category>/<book short name>/metadata.yml.

I'm wondering what the best structure would be for the capsule itself,
which I suppose depends on how we want to source and build it. Drew had
mentioned using subdomains for different languages, which seems like a
straightfoward way to not only organize the different texts (and their
translations), but also potentially provide multilingual interfaces. 

I do like the idea of URLs that are descripitve of their content, like
the ones Arav described in the previous quote, but I'm wondering if the
files themselves should actually live in these locations. Some texts
fall into multiple categories, and some have multiple authors. It seems
like indexes should be abstracted from where the actual files are stored
in order to allow for browsing the library in different ways (by author,
genre, language, title, etc.). If we build the site based on the
metadata of the books, it wouldn't be too dificult to just build out a
static representation of what exists in the library organized around
methods of accessing it.

When a text is added, the metadata will be read, and the relevant 
indexes across the capsule will be rebuilt. The files that result from 
this process would be, for example:

1. The whole text in Gemtext format including a table of contents made 
   of links to:
2. Separate .gmi files of the individual chapters, which include links
   to facilitate navigating between chapters.
3. An index page for that particular text which includes these files,
   links pointing to indexes for the author, genre, etc. in question,
   a table displaying the bibliographic data, and perhaps a .bib file
   (or something similar) that readers can use to import the metadata.

One question I have is, should the metadata be "embedded" in the files,
or at least included before or after the body of the text, or should 
the metadata remain with the source files/be included separately as a
.yml or .bib file?

This evening I will try to make a little prototype of how things might
be organized, that way we have something concrete to reference and
modify.

>I'm looking forward to a Gemini library!

Me too! :)

Thanks for your patience and interest everyone.

~mieum
Details
Message ID
<C6Q38D0C6NVK.GE6OFX73DOKR@gux>
In-Reply-To
<C6Q15NBLOOC9.1YGJLIY62ZXX5@maclibre> (view parent)
DKIM signature
missing
Download raw message
On Fri Oct 30, 2020 at 9:00 AM UTC, mieum wrote:
> I'm wondering what the best structure would be for the capsule itself,
> which I suppose depends on how we want to source and build it. Drew had
> mentioned using subdomains for different languages, which seems like a
> straightfoward way to not only organize the different texts (and their
> translations), but also potentially provide multilingual interfaces.

Agreed.

> I do like the idea of URLs that are descripitve of their content, like
> the ones Arav described in the previous quote, but I'm wondering if the
> files themselves should actually live in these locations. Some texts
> fall into multiple categories, and some have multiple authors. It seems
> like indexes should be abstracted from where the actual files are stored
> in order to allow for browsing the library in different ways (by author,
> genre, language, title, etc.). If we build the site based on the
> metadata of the books, it wouldn't be too dificult to just build out a
> static representation of what exists in the library organized around
> methods of accessing it.

These are all good ideas.  I think we can keep a centralized repository
of all the texts we collect, but then as texts are received they can be
added at different locations (for each classification by category /
author / etc., and for each language the interface exists in).  I think
the centralized repo should also be made available, perhaps at a
different subdomain (e.g. 'repo.booksin.space' or 'content.<...>' etc.),
so that others can freely keep copies of the same main content.

> When a text is added, the metadata will be read, and the relevant
> indexes across the capsule will be rebuilt. The files that result from
> this process would be, for example:
>
> 1. The whole text in Gemtext format including a table of contents made
> of links to:
> 2. Separate .gmi files of the individual chapters, which include links
> to facilitate navigating between chapters.
> 3. An index page for that particular text which includes these files,
> links pointing to indexes for the author, genre, etc. in question,
> a table displaying the bibliographic data, and perhaps a .bib file
> (or something similar) that readers can use to import the metadata.

This is a good idea.  However, I don't know if keeping individual .gmi
files for each chapter is possible or wanted.  I suppose it would depend
upon the size of the text (e.g. it may be viable for books, but not for
shorter stories).  Even if individual .gmi files for chapters are
provided, I would also like a singular .gmi file containing the entire
text, for completeness.  As such, it may be best to require a singular
complete .gmi, but also allow for splitting the text into multiple .gmi
files which would automagically (or manually) be linked to in a
table-of-contents page.

Unless too many complications arise, the main build system could simply
be a Makefile, which could refer to more complex commands (in C, Python,
whatever) to do stuff like index generation.

> One question I have is, should the metadata be "embedded" in the
> files, or at least included before or after the body of the text, or
> should the metadata remain with the source files/be included
> separately as a .yml or .bib file?

I think, because the files may contain somewhat repetitive content (due
to having translations of the main text), the metadata should be a
separate .yml / .bib file (or both .yml and .bib).  That may also be
slightly easier to parse (as you don't need to find the metadata
embedded inside a file, and there is a singular location for the
metadata).

Also, we should probably explicitly specify some terms, e.g. 'text'
refers to any content being held by the library (book, play, etc.), in
case confusion about terms ever arises.

~aravk | ~nothien
Details
Message ID
<C6Q41AD9KBSV.39QOYOWCLFX68@maclibre>
In-Reply-To
<C6Q38D0C6NVK.GE6OFX73DOKR@gux> (view parent)
DKIM signature
pass
Download raw message
>the centralized repo should also be made available, perhaps at a
>different subdomain (e.g. 'repo.booksin.space' or 'content.<...>' etc.),
>so that others can freely keep copies of the same main content.

I agree :) It's there, might as well make it accessible~ Also, on a
somehwat related note, it may be good to write the builder so that it is
portable to other domains. That is, it might be nice to make it so that
this repo could be cloned, built, and hosted on another domain if
desirable~ It's not "essential," but relatively easy to implement, so
why not?

>Even if individual .gmi files for chapters are
>provided, I would also like a singular .gmi file containing the entire
>text, for completeness.  As such, it may be best to require a singular
>complete .gmi, but also allow for splitting the text into multiple .gmi
>files which would automagically (or manually) be linked to in a
>table-of-contents page.

This is pretty much what I mean. There should be a complete text
available, but also it would be nice if the table of contents was a list
of links that linked to separate .gmi files for each chapter (because
some texts will be quite long and unwiedly otherwise...although, if your
client has a search function then it is kind of irrelevant :b). It may
be worth putting this off until later. It wouldn't be too hard to add
this "feature" later on if it seems like a good idea.

>Unless too many complications arise, the main build system could simply
>be a Makefile, which could refer to more complex commands (in C, Python,
>whatever) to do stuff like index generation.

For sure. I have only limited programming skills to contribute, but I
could collaborate on something in Python. I do not know C, but if that's
the direction things take then I guess I will be learning (I had wanted
to anyway. This whole project is a learning experience for me anyway,
like everything!)

>I think, because the files may contain somewhat repetitive content (due
>to having translations of the main text), the metadata should be a
>separate .yml / .bib file (or both .yml and .bib).  That may also be
>slightly easier to parse (as you don't need to find the metadata
>embedded inside a file, and there is a singular location for the
>metadata).

These are good points. I had not considered just sourcing metadata from
a .bib file. That might be a better option, actually. Speaking of .bib
files, it might be nice to have a central .bib file that includes all
the info for the whole library. That would be useful for writing scripts
to GET all the books from one author, for example.

Also I meant to respond to your comment about page numbers. Unless the
source contains them, that may be difficult to include. I'm always happy
to find ebooks that have "real" page numbers because it makes citing
them in papers easier. As far as denoting them in the text, there are
several ways. One would simply be to add something like [pg 43] in-line
with the text. But this is hard to notice. Another option would be to
break a paragraph with a single pre-formatted line that states the
"real" page number. But keep in mind, the goal of this library is not
necessarily to provide exact replications of the originals. Indeed,
whatever we get from project gutenberg, for example, is already an
abstraction or derivation of an original source. But like you said,
where page numbers are an integral part of the text, that is, where they
are needed to make sense of the content, then whoever prepares that text
would need to decided on an appropriate way to indicate them.

Thanks Arav! Also, I saw that you contacted me on irc, and I pinged you
back.

~mieum
Details
Message ID
<C6QE2KFBI4VK.2WXYRTCJHTIU1@callum-computer>
In-Reply-To
<C6Q41AD9KBSV.39QOYOWCLFX68@maclibre> (view parent)
DKIM signature
pass
Download raw message
> Speaking of .bib files, it might be nice to have a central .bib file
> that includes all the info for the whole library.

I agree. Either one file with different language entries differetiated
somehow, or each language has its own .bib file.

The citation-key could be used for the paths to the actual content:
gemini://en.booksin.space/a-citation-key/ would get you to a directory
containing all the relevant files.

> There should be a complete text available, but also it would be nice
> if the table of contents was a list of links that linked to separate
> .gmi files for each chapter

I think that the only obligatory file should be the one containing the
whole text. Then there could be subdirectories containing alternative
formats: chapters, pages, subsections, whatever. Those could probably
be made manually rather than auto-generating them from the full text,
but common formats (like chapters) should be standardised.

Some examples of what it could look like:
gemini://en.booksin.space/dickens-a-christmas-carol/full-text.gmi
gemini://en.booksin.space/dickens-a-christmas-carol/chapters/1.gmi
gemini://es.booksin.space/dickens-cuento-de-navidad/texto-completo.gmi

The index page for a book directory could be generated from the .bib
entry and by looking at any files or subdirectories.

Callum
Details
Message ID
<C6QF7ZD8H277.3KPGUAXQ1JLUY@gux>
In-Reply-To
<C6QE2KFBI4VK.2WXYRTCJHTIU1@callum-computer> (view parent)
DKIM signature
missing
Download raw message
On Fri Oct 30, 2020 at 7:07 PM UTC, Callum Brown wrote:
> I agree. Either one file with different language entries differetiated
> somehow, or each language has its own .bib file.

I don't know the full details of .bib files, but if this is needed then
it sounds like a good idea.

> The citation-key could be used for the paths to the actual content:
> gemini://en.booksin.space/a-citation-key/ would get you to a directory
> containing all the relevant files.

I would rather have this at a separate level, e.g.
en.booksin.space/cite/a-citation-key.  The top-level should be toward
categories and special pages.

> I think that the only obligatory file should be the one containing the
> whole text. Then there could be subdirectories containing alternative
> formats: chapters, pages, subsections, whatever. Those could probably
> be made manually rather than auto-generating them from the full text,
> but common formats (like chapters) should be standardised.

Agreed, mostly.  The providers of the texts would have to provide
indicators for chapters and pages etc. in order for it to be parsed, and
the current consensus seems to be to leave that to be optional.  I think
we should specify an optional formatting to use so that these indexes
can be automatically generated, but let the authors decide whether or
not to use it (and they can then specify whether it is provided in the
.yaml metadata).

> Some examples of what it could look like:
> gemini://en.booksin.space/dickens-a-christmas-carol/full-text.gmi
> gemini://en.booksin.space/dickens-a-christmas-carol/chapters/1.gmi
> gemini://es.booksin.space/dickens-cuento-de-navidad/texto-completo.gmi

I think we should make the texts of other languages also available from
all language-specific interfaces, just defaulting to a certain language
in that language's interface.  So it would rather be
blah/full-text.en.gmi, with other languages also available.  Same goes
for each chapter - you could have blah/chapters/1.en.gmi etc.  I like
the concept of using localized names in the URLs for everything, but I
don't know whether it would work with more complicated languages (e.g.
Chinese).  IIRC, there is a way to encode Chinese characters in URLs,
but I don't remember whether it would look as neat.

> The index page for a book directory could be generated from the .bib
> entry and by looking at any files or subdirectories.

I think that would work.  However, I don't know what level of detail can
be idiomatically encoded in a .bib.  If it seems that we have too much
information needed, I suggest we stick with using the YAML for all
in-site needs (but we still provide the .bib files).

~aravk | ~nothien
Details
Message ID
<C6QFT3ETT965.QRB9DTIB9WY4@callum-computer>
In-Reply-To
<C6QF7ZD8H277.3KPGUAXQ1JLUY@gux> (view parent)
DKIM signature
pass
Download raw message
> I don't know the full details of .bib files

Neither do I :-)

> I don't know what level of detail can be idiomatically encoded in a
> .bib. If it seems that we have too much information needed, I suggest
> we stick with using the YAML for all in-site needs

I know the format used in .bib files is geared towards making citations,
cross references, etc., and that the programs that use them
(e.g. BibTeX, Biber) are tightly coupled to TeX/LaTeX.
From that point of view YAML probably makes more sense.

> I would rather have this at a separate level, e.g.
> en.booksin.space/cite/a-citation-key. The top-level should be toward
> categories and special pages.

Yep that's a good point. Maybe /books/ rather that /cite/?

> I think we should specify an optional formatting to use so that these
> indexes can be automatically generated, but let the authors decide
> whether or not to use it (and they can then specify whether it is
> provided in the .yaml metadata).

Sounds good.

> I think we should make the texts of other languages also available from
> all language-specific interfaces, just defaulting to a certain language
> in that language's interface. So it would rather be
> blah/full-text.en.gmi, with other languages also available. Same goes
> for each chapter - you could have blah/chapters/1.en.gmi etc.

Storing all translations of the same book in one place makes sense.
I guess you would have metadata files for all the languages that book
is available in too, then some kind unifying file with a unique book
id and references to the localised metadata.

> I like the concept of using localized names in the URLs for
> everything, but I don't know whether it would work with more
> complicated languages (e.g. Chinese).

I handn't thought of that. It would probably make page generation a
pain too.

Callum
Details
Message ID
<C6SFP8YYJ146.1V609X61E02U@maclibre>
In-Reply-To
<C6QFT3ETT965.QRB9DTIB9WY4@callum-computer> (view parent)
DKIM signature
pass
Download raw message
Sorry for my delay again, folks. I have been mostly AFK these past few
days~

> Either one file with different language entries differetiated
> somehow, or each language has its own .bib file.

This might become a little unwieldy. I liked the idea of using a .bib
file to store all the metadata to build out the site, but it might
become a little cumbersome. In the case of different translations of a
text, my understanding is that each individual "text," which in our case
would be the actual files containing the "books," should have it's own
.bib file or entry in a master .bib file. Trying to pack all the
metadata we need into a .bib file seems possible, but it might be better
to just use yaml and source it to build .bib files where we need them :)

> I know the format used in .bib files is geared towards making 
> citations, cross references, etc., and that the programs that use them
> (e.g. BibTeX, Biber) are tightly coupled to TeX/LaTeX.
> From that point of view YAML probably makes more sense.

> If it seems that we have too much information needed, I suggest we
> stick with using the YAML for all in-site needs (but we still provide
> the .bib files).

I tend to agree with this point now, after thinking about it all
weekend. The bibtex format is pretty versatile and in no way limited to
TeX/LaTeX, but it seems like using yaml would be more robust---and plus
the tooling is already available for such a purpose of building out a
site.

Likewise, I don't think it would be necessary to include a separate
/cite/ or /books/ level. The .bib file can just live with the other .gmi
files and be accessed through the index page for the book, no?

> I like the concept of using localized names in the URLs for everything,
> but I don't know whether it would work with more complicated languages
> (e.g. Chinese).  IIRC, there is a way to encode Chinese characters in
> URLs, but I don't remember whether it would look as neat.

The URLs would have to be percent-encoded, which makes it kind of messy.
If your client percent-encodes URLs for you, then it's not really an
issue I suppose. 

So, I think we should try and sketch out how to entire site should be
structured. I'm having a hard to time keeping it all in my head (since I
have been VERY distracted lately >_<). I'm trying to picture the entire
process of adding a book and publishing it on the library shelves. Here
are some questions and ideas I have about that:

1. Should the data files (the .gmi files of the books themselves) be
   stored statically to allow permalinks? If so, where? At first I
   thought it would be best to just keep all the data files in a
   separate subdomain, such as stacks.booksin.space, and link to it from
   indexes on language-specific subdomains: en.booksin.space,
   es.booksin.space, etc. For example, a book accessed at
   en.booksin.space/melville/mobydick/mobydick.gmi or 
   en.booksin.space/fiction/mobydick/mobydick.gmi would 
   essentially be a symlink to somwhere like
   stacks.booksin.space/melville/mobydick/en/mobydick.gmi, and the
   "permalink" could be given on the index page.

   Is that a weird way to go about it? Different trnaslations would be
   kept in a subfolder where the text is located in the "stack" (such as
   stacks.booksin.space/melville/mobydick/es/mobydick.gmi, 
   ... /de/mobydick.gmi, etc.) and the index pages of each translation
   would be linked to each other.

   At any rate, it would be helpful to sketch out all the ways the files
   will be accessed through indexes---that is, the logical structure of
   a path, say, for browsing texts by genre. 

2. The "source files" of the library. Will the source files be separate
   from the data files, or should the "source" be the static directory
   of data files? What I mean is, when a book is added, for example, a
   .gmi file containing the text of the book (formatted in Gemtext) with
   a yaml header is added to the repo. That change gets pulled to the
   server where it is then built into the site and all the static
   indexes, links, etc. are built and adjusted based on the yaml
   metadata. Is this a desirable method, or would it be better to
   manually add a new book to the "stacks" in the form of a .gmi file of
   the text and an accompanying .yml file, and then have the capsule
   builder look for new additions in the stacks and build out all the
   indexes that way?

Sorry if I don't make perfect sense...I am quite sleep deprived right
now haha. Again, thanks for all your interest everyone. Hopefully this
week I will have more time at the computer so I can work on this some
more.

~mieum
Details
Message ID
<C6SNUFB650FV.33Q9XOEYIFX53@gux>
In-Reply-To
<C6SFP8YYJ146.1V609X61E02U@maclibre> (view parent)
DKIM signature
missing
Download raw message
On Mon Nov 2, 2020 at 3:49 AM WAT, mieum wrote:
> Likewise, I don't think it would be necessary to include a separate
> /cite/ or /books/ level. The .bib file can just live with the other
> .gmi files and be accessed through the index page for the book, no?

The purpose of this was to find books based on citation data.  The .bib
files will live alongside the .gmi files, yes, but /cite/<citation>
could be used to find a book when the name is not known but some sort of
citation data is.  I've forgotten exactly what this was about.

> The URLs would have to be percent-encoded, which makes it kind of messy.
> If your client percent-encodes URLs for you, then it's not really an
> issue I suppose.

May as well support it, I guess.

> <how everything should be laid out>

Here's how I see everything in my head:

There is a "source" subdomain serving the static content of the books,
without any language-specific components.  It is served online at the
root level of `static.booksin.space`.  Its format is the following:

```
/ root
/<authors>/ Everything by a certain group of authors
/<authors>/info.yml Information about the group of authors
/<authors>/books.list A line-by-line list of all book names.
/<authors>/books.tar.gz Archive of all the books by the authors
/<authors>/<book>/ Everything about a certain book
/<authors>/<book>/meta.yml All metadata about the bok
/<authors>/<book>/text.<lang>.gmi Complete language-specific texts
/<authors>/<book>/meta.<lang>.bib Language-specific citation data
/<authors>/<book>/... Other assets (like images)
/<authors>/<book>/archive.tar.gz Archive of all the book data
```

To form an `<authors>` name, take the name of each author, replace
spaces with underscores, encode any difficult characters (e.g. forward
slashes) and join each author name together separating by commas.  To
prevent duplicates, the list of authors should be alphabetically sorted
(using Unicode' sorting algorithm).  To prevent excessively long names,
only the first three authors (after sorting) should be included.

To form a `<book>` name, take the full name of the book, replace spaces
with underscores, encode any difficult characters (e.g. forward
slashes), and you are done.

The language-specific citation data is derived from the metadata.

There are language-specific user interfaces available at
`<lang>.booksin.space`.  Each one has the following format:

```
/ interface root
/<authors> Everything by a certain group of authors
/<authors>/info.yml Information about the group of authors
/<authors>/books.list A line-by-line list of all book names.
/<authors>/books.tar.gz Archive of all the books by the authors
/<authors>/<book> Everything about a certain book
/<authors>/<book>/meta.yml All metadata about the bok
/<authors>/<book>/text.gmi Complete text in the interface language
/<authors>/<book>/text.<lang>.gmi Complete language-specific texts
/<authors>/<book>/meta.bib Citation data in the interface language
/<authors>/<book>/meta.<lang>.bib Language-specific citation data
/<authors>/<book>/... Other assets (like images)
/<authors>/<book>/archive.tar.gz Archive of all the book data
/search?<search term> Search for specific authors and/or books
```

The difference here is that the `/<authors>` page and the
`/<authors>/<book>` page are actual pages, not just directories.  Both
are language-specific indexes into the books by the authors and the
files of the books, respectively, generated from the matadata YAML file.

Note that even within a language-specific interface, the text (and
citation data) are available in other languages too.

For every author known to the system, they have a page available at
`/<author>`, which is an index to all the books they have (generated
from the author-specific metadata file).

The exact format of the search is not yet decided.  Perhaps two fields
would be allowed, each one specifying a regex to limit results by (one
for authors and one for books).  A smarter word-oriented search would be
better, but I don't know how that would work.

The Git repository houses only the essential information.  It will be
served at git.booksin.space (the built-in git server has dump HTTP(S)
protcol support - we may be able to bridge that to Gemini, but no other
Git client would speak it, so may as well provide it over HTTPS itself).
Its format is the following:

```
/ root
/<authors>/ Everything by a certain group of authors
/<authors>/info.yml Information about the group of authors
/<authors>/<book>/ Everything about a certain book
/<authors>/<book>/meta.yml All metadata about the bok
/<authors>/<book>/text.<lang>.gmi Complete language-specific texts
/<authors>/<book>/... Other assets (like images)
```

Submitting new books involves sending in a patch to the mailing list
which adds in the relevant book information.  Perhaps in the future a
more non-technical submission method will be provided.  Submissions can
be verified by recognized contributors who can then merge them to the
repo (if something is amiss, the contributor would ask the submitter to
fix stuff up and send it in as a v2).

When the sr.ht Git repository is updated, a sr.ht build job will verify
the integrity of everything, then send the commits forward to the
git.booksin.space server.  When this server receives new patches, a
git-hook will update all the other subdomain's static data (in the
future, if running these updates is too expensive to do often, perhaps
the update can be queued when necessary to execute at fixed intervals,
say 2 hours).

~aravk | ~nothien
Details
Message ID
<C6SSLF9F0X0D.3JILVV5D9W7X3@maclibre>
In-Reply-To
<C6SNUFB650FV.33Q9XOEYIFX53@gux> (view parent)
DKIM signature
pass
Download raw message
Arav, thanks for all your effort and input :) Your sketch helps
visualize how to start building this thing. I have a few questions and
comments in response:

1. Should the source files for the capsule builder be included in the
   repo, or should that be put into its own project repo elsewhere? I'm
   not really sure how the sourcehut builds system works or how to plan
   around it :) 
   
   But as far as update intervals goes, I don't know that
   something such as this would require being updated frequently. I
   wonder about how many more resources this would require (not just the
   server, but also on sourcehut's end). Alternatively, couldn't the
   server pull the changes in response to a webhook, and then have a
   rebuild triggered if changes are merged?

2. For texts with multiple authors, there should be a way to ensure that
   a mutli-authored text is included in the indexes of each of those
   individual authors also. It seems like this would be trivial to
   implement when building from the metadata.

3. Will this require a database, or would it suffice to just source the
   data and metadata directly through the source directory structure at
   build time?

4. This one kind of relates to #1, but shouldn't the git repo include
   some sort of yaml template that includes all the necessary and
   accepted fields? Perhaps there should at least be a msc/ or meta/
   folder containing stuff like that...?

It seemed like there was more I wanted to ask about, but I'm too groggy
to remember. Anyway, thanks again, Arav. I'm hoping that this week I
will have a chance to start getting some code together (by the weekend
at the latest). 

~mieum
Details
Message ID
<C6STH0GAJB1G.2C0E9XCLJH74@taiga>
In-Reply-To
<C6SSLF9F0X0D.3JILVV5D9W7X3@maclibre> (view parent)
DKIM signature
fail
Download raw message
DKIM signature: fail
On Mon Nov 2, 2020 at 8:55 AM EDT, mieum wrote:
> 1. Should the source files for the capsule builder be included in the
> repo, or should that be put into its own project repo elsewhere? I'm
> not really sure how the sourcehut builds system works or how to plan
> around it :)

I would probably put them in separate repos. builds.sr.ht can manage
that just fine.

> But as far as update intervals goes, I don't know that
> something such as this would require being updated frequently. I
> wonder about how many more resources this would require (not just the
> server, but also on sourcehut's end). Alternatively, couldn't the
> server pull the changes in response to a webhook, and then have a
> rebuild triggered if changes are merged?

I would just use .build.yml in the source data repository which builds
and pushes the books up to the server over SSH. See how I deploy my blog
as an example:

https://git.sr.ht/~sircmpwn/drewdevault.com/tree/master/.build.yml

> 3. Will this require a database, or would it suffice to just source the
> data and metadata directly through the source directory structure at
> build time?

A database might not be necessary at first, but it might be helpful
later on to enable features like search.

> 4. This one kind of relates to #1, but shouldn't the git repo include
> some sort of yaml template that includes all the necessary and
> accepted fields? Perhaps there should at least be a msc/ or meta/
> folder containing stuff like that...?

Yeah, probably.

> It seemed like there was more I wanted to ask about, but I'm too groggy
> to remember. Anyway, thanks again, Arav. I'm hoping that this week I
> will have a chance to start getting some code together (by the weekend
> at the latest).

I think it's time to go less talky more codey.
Details
Message ID
<C6STJ9QUW04D.1VJYQ6MTWSIAW@maclibre>
In-Reply-To
<C6STH0GAJB1G.2C0E9XCLJH74@taiga> (view parent)
DKIM signature
pass
Download raw message
How does Drew respond so fast? I JUST sent that email haha 

Anyway, thanks for clarifying, Drew. 

> I think it's time to go less talky more codey.

Indeed! But first, time to catch a few Zs while I can (it seems like my
son is done with this round of teething finally >_<).

~mieum
Reply to thread Export thread (mbox)