I've been trying to get the new caching feature to work, but keep
getting errors about missing hash files.
[WARNING] Cache directory for page site/crystal/Calibrating Future
Experiences.odt does not contain a page source hash file
(.page_source_hash),cache will be discarded!
the config includes these settings
caching = true
cache_dir = ".soupault-cache"
Pages are generated in the preprocessors stage using an external pandoc
script. The site is generated as expected and looks like the site
structure is successfully created in the cache dir however each of the
page dirs are empty.
Are there other settings or changes required to cache the preprocessor
results?
This is odd. There shouldn't be a need for any other settings, so you
have likely found a bug.
Is your site source public so that I could test it myself on the same data?
On 1/25/23 10:25, nik.srht@fo.am wrote:
> I've been trying to get the new caching feature to work, but keep > getting errors about missing hash files.>> [WARNING] Cache directory for page site/crystal/Calibrating Future > Experiences.odt does not contain a page source hash file > (.page_source_hash),cache will be discarded!>> the config includes these settings>> caching = true> cache_dir = ".soupault-cache">> Pages are generated in the preprocessors stage using an external > pandoc script. The site is generated as expected and looks like the > site structure is successfully created in the cache dir however each > of the page dirs are empty.>> Are there other settings or changes required to cache the preprocessor > results?>
The site isn't public yet, but i'll try making a minimal example to see
if the bug can be replicated.
On 2023-01-25 11:54, Daniil Baturin wrote:
> This is odd. There shouldn't be a need for any other settings, so you > have likely found a bug.> > Is your site source public so that I could test it myself on the same data?> > On 1/25/23 10:25, nik.srht@fo.am wrote:>> I've been trying to get the new caching feature to work, but keep >> getting errors about missing hash files.>>>> [WARNING] Cache directory for page site/crystal/Calibrating Future >> Experiences.odt does not contain a page source hash file >> (.page_source_hash),cache will be discarded!>>>> the config includes these settings>>>> caching = true>> cache_dir = ".soupault-cache">>>> Pages are generated in the preprocessors stage using an external >> pandoc script. The site is generated as expected and looks like the >> site structure is successfully created in the cache dir however each >> of the page dirs are empty.>>>> Are there other settings or changes required to cache the preprocessor >> results?>>
I took a closer look and managed to confuse myself by setting debug=true
which appears to avoid the problem.
maybe the cache is not being built correctly when debug is false?
e.g.
% soupault --debug
[INFO] Starting soupault 4.4.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[DEBUG] Widget processing order:
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/faustroll.odt
[DEBUG] Saving new page hash to
.soupault-cache/site/faustroll.odt/.page_source_hash
[INFO] Calling page preprocessor "pandoc --from=odt --to=html
--wrap=preserve --reference-links --reference-location=block" on page
site/faustroll.odt
[DEBUG] Saving a cached object to
.soupault-cache/site/faustroll.odt/631383cb2530d44b9cde4d4b454b38869b26fb667d4c6067548b1ffc8394efca_40053855b355ea57f47456bebc41688c5330c8a296a7bbe9df89cc04834a351d
[INFO] Using the default template for page site/faustroll.odt
[DEBUG] Not inserting index data: indexing is disabled in the configuration
[INFO] Writing generated page to build/faustroll/index.html
% soupault --debug
[INFO] Starting soupault 4.4.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[DEBUG] Widget processing order:
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/faustroll.odt
[DEBUG] Cache for page site/faustroll.odt is considered valid and will
be used
[DEBUG] Reading a cached object from
.soupault-cache/site/faustroll.odt/631383cb2530d44b9cde4d4b454b38869b26fb667d4c6067548b1ffc8394efca_40053855b355ea57f47456bebc41688c5330c8a296a7bbe9df89cc04834a351d
[INFO] Using the default template for page site/faustroll.odt
[DEBUG] Not inserting index data: indexing is disabled in the configuration
[INFO] Writing generated page to build/faustroll/index.html
; [vrt] soupault
[INFO] Starting soupault 4.4.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/faustroll.odt
[INFO] Using the default template for page site/faustroll.odt
[INFO] Writing generated page to build/faustroll/index.html
% soupault
[INFO] Starting soupault 4.4.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/faustroll.odt
[INFO] Using the default template for page site/faustroll.odt
[INFO] Writing generated page to build/faustroll/index.html
% rm -rf .soupault-cache/*
% soupault
[INFO] Starting soupault 4.4.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/faustroll.odt
[INFO] Calling page preprocessor "pandoc --from=odt --to=html
--wrap=preserve --reference-links --reference-location=block" on page
site/faustroll.odt
[INFO] Using the default template for page site/faustroll.odt
[INFO] Writing generated page to build/faustroll/index.html
% soupault
[INFO] Starting soupault 4.4.0 in website generator mode
[INFO] Loading plugins
[INFO] Loading widgets
[INFO] Loading hooks
[INFO] Starting website build
[INFO] Processing page site/faustroll.odt
[WARNING] Cache directory for page site/faustroll.odt does not contain a
page source hash file (.page_source_hash),cache will be discarded!
[INFO] Calling page preprocessor "pandoc --from=odt --to=html
--wrap=preserve --reference-links --reference-location=block" on page
site/faustroll.odt
[INFO] Using the default template for page site/faustroll.odt
[INFO] Writing generated page to build/faustroll/index.html
On 2023-01-25 11:54, Daniil Baturin wrote:
> This is odd. There shouldn't be a need for any other settings, so you > have likely found a bug.> > Is your site source public so that I could test it myself on the same data?> > On 1/25/23 10:25, nik.srht@fo.am wrote:>> I've been trying to get the new caching feature to work, but keep >> getting errors about missing hash files.>>>> [WARNING] Cache directory for page site/crystal/Calibrating Future >> Experiences.odt does not contain a page source hash file >> (.page_source_hash),cache will be discarded!>>>> the config includes these settings>>>> caching = true>> cache_dir = ".soupault-cache">>>> Pages are generated in the preprocessors stage using an external >> pandoc script. The site is generated as expected and looks like the >> site structure is successfully created in the cache dir however each >> of the page dirs are empty.>>>> Are there other settings or changes required to cache the preprocessor >> results?>>
Hi Nik,
I fixed the problem. It was a funny case of missing parentheses that
made bits of actual logic interpreted
as a part of a debug log function body:
https://codeberg.org/PataphysicalSociety/soupault/commit/599f0f921c32b0d5daf41e5ba4fa369f55acb15c
Could you try building again and let me know if it works for you without
debug now?
On 1/25/23 12:49, nik gaffney wrote:
>> I took a closer look and managed to confuse myself by setting > debug=true which appears to avoid the problem.>> maybe the cache is not being built correctly when debug is false?>> e.g.>> % soupault --debug>> [INFO] Starting soupault 4.4.0 in website generator mode> [INFO] Loading plugins> [INFO] Loading widgets> [DEBUG] Widget processing order:> [INFO] Loading hooks> [INFO] Starting website build> [INFO] Processing page site/faustroll.odt> [DEBUG] Saving new page hash to > .soupault-cache/site/faustroll.odt/.page_source_hash> [INFO] Calling page preprocessor "pandoc --from=odt --to=html > --wrap=preserve --reference-links --reference-location=block" on page > site/faustroll.odt> [DEBUG] Saving a cached object to > .soupault-cache/site/faustroll.odt/631383cb2530d44b9cde4d4b454b38869b26fb667d4c6067548b1ffc8394efca_40053855b355ea57f47456bebc41688c5330c8a296a7bbe9df89cc04834a351d> [INFO] Using the default template for page site/faustroll.odt> [DEBUG] Not inserting index data: indexing is disabled in the > configuration> [INFO] Writing generated page to build/faustroll/index.html>> % soupault --debug>> [INFO] Starting soupault 4.4.0 in website generator mode> [INFO] Loading plugins> [INFO] Loading widgets> [DEBUG] Widget processing order:> [INFO] Loading hooks> [INFO] Starting website build> [INFO] Processing page site/faustroll.odt> [DEBUG] Cache for page site/faustroll.odt is considered valid and will > be used> [DEBUG] Reading a cached object from > .soupault-cache/site/faustroll.odt/631383cb2530d44b9cde4d4b454b38869b26fb667d4c6067548b1ffc8394efca_40053855b355ea57f47456bebc41688c5330c8a296a7bbe9df89cc04834a351d> [INFO] Using the default template for page site/faustroll.odt> [DEBUG] Not inserting index data: indexing is disabled in the > configuration> [INFO] Writing generated page to build/faustroll/index.html> ; [vrt] soupault> [INFO] Starting soupault 4.4.0 in website generator mode> [INFO] Loading plugins> [INFO] Loading widgets> [INFO] Loading hooks> [INFO] Starting website build> [INFO] Processing page site/faustroll.odt> [INFO] Using the default template for page site/faustroll.odt> [INFO] Writing generated page to build/faustroll/index.html>> % soupault>> [INFO] Starting soupault 4.4.0 in website generator mode> [INFO] Loading plugins> [INFO] Loading widgets> [INFO] Loading hooks> [INFO] Starting website build> [INFO] Processing page site/faustroll.odt> [INFO] Using the default template for page site/faustroll.odt> [INFO] Writing generated page to build/faustroll/index.html>> % rm -rf .soupault-cache/*>> % soupault>> [INFO] Starting soupault 4.4.0 in website generator mode> [INFO] Loading plugins> [INFO] Loading widgets> [INFO] Loading hooks> [INFO] Starting website build> [INFO] Processing page site/faustroll.odt> [INFO] Calling page preprocessor "pandoc --from=odt --to=html > --wrap=preserve --reference-links --reference-location=block" on page > site/faustroll.odt> [INFO] Using the default template for page site/faustroll.odt> [INFO] Writing generated page to build/faustroll/index.html>> % soupault>> [INFO] Starting soupault 4.4.0 in website generator mode> [INFO] Loading plugins> [INFO] Loading widgets> [INFO] Loading hooks> [INFO] Starting website build> [INFO] Processing page site/faustroll.odt> [WARNING] Cache directory for page site/faustroll.odt does not contain > a page source hash file (.page_source_hash),cache will be discarded!> [INFO] Calling page preprocessor "pandoc --from=odt --to=html > --wrap=preserve --reference-links --reference-location=block" on page > site/faustroll.odt> [INFO] Using the default template for page site/faustroll.odt> [INFO] Writing generated page to build/faustroll/index.html>>> On 2023-01-25 11:54, Daniil Baturin wrote:>> This is odd. There shouldn't be a need for any other settings, so you >> have likely found a bug.>>>> Is your site source public so that I could test it myself on the same >> data?>>>> On 1/25/23 10:25, nik.srht@fo.am wrote:>>> I've been trying to get the new caching feature to work, but keep >>> getting errors about missing hash files.>>>>>> [WARNING] Cache directory for page site/crystal/Calibrating Future >>> Experiences.odt does not contain a page source hash file >>> (.page_source_hash),cache will be discarded!>>>>>> the config includes these settings>>>>>> caching = true>>> cache_dir = ".soupault-cache">>>>>> Pages are generated in the preprocessors stage using an external >>> pandoc script. The site is generated as expected and looks like the >>> site structure is successfully created in the cache dir however each >>> of the page dirs are empty.>>>>>> Are there other settings or changes required to cache the >>> preprocessor results?>>>
Thanks! that fixed it.
On 2023-01-26 03:49, Daniil Baturin wrote:
> Hi Nik,> > I fixed the problem. It was a funny case of missing parentheses that > made bits of actual logic interpreted> as a part of a debug log function body: > https://codeberg.org/PataphysicalSociety/soupault/commit/599f0f921c32b0d5daf41e5ba4fa369f55acb15c> > Could you try building again and let me know if it works for you without > debug now?>
Thanks for testing it! I'm planning to make a release early next week then.
On 1/26/23 10:03, nik gaffney wrote:
>> Thanks! that fixed it.>> On 2023-01-26 03:49, Daniil Baturin wrote:>> Hi Nik,>>>> I fixed the problem. It was a funny case of missing parentheses that >> made bits of actual logic interpreted>> as a part of a debug log function body: >> https://codeberg.org/PataphysicalSociety/soupault/commit/599f0f921c32b0d5daf41e5ba4fa369f55acb15c>>>> Could you try building again and let me know if it works for you >> without debug now?>>>
On the topic of caching, do you have any plans to add caching for
asset_processors?
I'm currently using an external script for the asset_processors which
checks pre and post checksums. would certainly simplify things if that
was part of the standard build process.
also, is there a way to ensure asset_processors are run after the
preprocessors have completed?
>do you have any plans to add caching for asset_processors?
That's complicated. With pages, it's simple since the output path is
decided by soupault itself.
With asset processors, the user specifies a template for generating a
complete command.
That is required to accommodate commands with peculiar syntax that makes
it impossible to just append the output file path,
and to allow original and processed files to have different extensions.
However, it also means that soupault doesn't actually know the output
path and cannot replicate what the user-given command would do.
I agree that it would be nice to cache asset processor outputs but it's
going to require syntax design changes.
If you have ideas how to best handle that, please share.
>is there a way to ensure asset_processors are run after the
preprocessors have completed?
Asset files are always processed before page files:
https://codeberg.org/PataphysicalSociety/soupault/src/commit/599f0f921c32b0d5daf41e5ba4fa369f55acb15c/src/soupault.ml#L895-L908
The reason is simply that asset file processing is the same (if it's to
be done at all), while page processing workflows differ for cases when
index.index_first is enabled and when it's not.
That said, the decision to make processing pages and assets separate
steps is strategic, but the order of those steps is trivial.
If there's a compelling reason to switch them, I see no reason why not
to do it.
On 1/26/23 13:28, nik gaffney wrote:
>> On the topic of caching, do you have any plans to add caching for > asset_processors?>> I'm currently using an external script for the asset_processors which > checks pre and post checksums. would certainly simplify things if that > was part of the standard build process.>> also, is there a way to ensure asset_processors are run after the > preprocessors have completed?>>
On 2023-01-27 05:07, Daniil Baturin wrote:
> >do you have any plans to add caching for asset_processors?> > That's complicated. With pages, it's simple since the output path is > decided by soupault itself.> With asset processors, the user specifies a template for generating a > complete command.> That is required to accommodate commands with peculiar syntax that makes > it impossible to just append the output file path,> and to allow original and processed files to have different extensions.> However, it also means that soupault doesn't actually know the output > path and cannot replicate what the user-given command would do.
admittedly in the general case it's not so obvious. at the moment I'm
relying on a filter which takes input & output paths. the filter just
checks if the input file has changed or output is missing to avoid
unnecessary work.
e.g.
png = "./filters/process_png '{{source_file_path}}'
'{{target_dir}}/{{source_file_name}}'
> I agree that it would be nice to cache asset processor outputs but it's > going to require syntax design changes.
Can't think of anything at the moment that could work in the general
case without relying on explicit description, but...
e.g. in the above something like
png_cache = ['{{source_file_path}}',
'{{target_dir}}/{{source_file_name}}',
'{{target_dir}}/preview_{{source_file_name}}']
which could just check for matching checksums (or some other cache
invalidation?) before running the asset_processor (which might generate
assets on paths not specified in the command)
> If you have ideas how to best handle that, please share.
I'll think further about how it could work more generally.
>> is there a way to ensure asset_processors are run after the > preprocessors have completed?> That said, the decision to make processing pages and assets separate > steps is strategic, but the order of those steps is trivial.> If there's a compelling reason to switch them, I see no reason why not > to do it.
I would agree that keeping them separate is useful. The only motive I
have is based on a use case where the preprocessor may produce assets
(as side effect) as well as html (output). In particular, converting a
pdf or odt for example might produce image files so would be useful to
run the asset_processor after (rather than just run soupault twice or
relying on another explicit build stage)
>png = "./filters/process_png '{{source_file_path}}'
'{{target_dir}}/{{source_file_name}}'
Yes, the problem is that you know that the output path is
'{{target_dir}}/{{source_file_name}}' but to soupault that command is
opaque.
To make automatic caching possible, the output path needs to be made
explicit.
One compatible syntax I can think of it like this:
[asset_processors]
png = { target_path_template =
"{{target_dir}}/{{source_file_name}}.css", command_template = "sass
{{source_file_path}} {{target_file_path}}" }
where {{target_file_path}} is generated using target_path_template and
injected in the command_template environment.
It's much more complicated than the current one, but I can see how I
could add it — check if the value is a string or an inline table,
then use different ways of constructing the complete command.
I'm not sure if it's worthwhile, though. I'm by no means against caching
asset processor outputs, just not sure if trying to embed an asset
management
system inside of soupault is a good idea or not.
(The source of all problems is that page preprocessors work with stdin
and stdout, while many type of asset processors like image convertors
may not even support writing to stdout, and reading potentially very
large files into memory just to postprocess and cache them
can cause lots of problems for users.)
On 1/27/23 10:37, nik gaffney wrote:
>> On 2023-01-27 05:07, Daniil Baturin wrote:>> >do you have any plans to add caching for asset_processors?>>>> That's complicated. With pages, it's simple since the output path is >> decided by soupault itself.>> With asset processors, the user specifies a template for generating a >> complete command.>> That is required to accommodate commands with peculiar syntax that >> makes it impossible to just append the output file path,>> and to allow original and processed files to have different extensions.>> However, it also means that soupault doesn't actually know the output >> path and cannot replicate what the user-given command would do.>> admittedly in the general case it's not so obvious. at the moment I'm > relying on a filter which takes input & output paths. the filter just > checks if the input file has changed or output is missing to avoid > unnecessary work.>> e.g.>> png = "./filters/process_png '{{source_file_path}}' > '{{target_dir}}/{{source_file_name}}'>>> I agree that it would be nice to cache asset processor outputs but >> it's going to require syntax design changes.>> Can't think of anything at the moment that could work in the general > case without relying on explicit description, but...>> e.g. in the above something like>> png_cache = ['{{source_file_path}}', > '{{target_dir}}/{{source_file_name}}', > '{{target_dir}}/preview_{{source_file_name}}']>> which could just check for matching checksums (or some other cache > invalidation?) before running the asset_processor (which might > generate assets on paths not specified in the command)>>> If you have ideas how to best handle that, please share.>> I'll think further about how it could work more generally.>>>> is there a way to ensure asset_processors are run after the >> preprocessors have completed?>>> That said, the decision to make processing pages and assets separate >> steps is strategic, but the order of those steps is trivial.>> If there's a compelling reason to switch them, I see no reason why >> not to do it.>> I would agree that keeping them separate is useful. The only motive I > have is based on a use case where the preprocessor may produce assets > (as side effect) as well as html (output). In particular, converting a > pdf or odt for example might produce image files so would be useful to > run the asset_processor after (rather than just run soupault twice or > relying on another explicit build stage)>>>>
On 2023-01-27 13:28, Daniil Baturin wrote:
> I'm not sure if it's worthwhile, though. I'm by no means against caching > asset processor outputs, just not sure if trying to embed an asset > management> system inside of soupault is a good idea or not.
It's probably more trouble than it's worth. That said, if asset
processing adds a significant build overhead, might be a good idea to
include some suggestions and/or examples in the docs.
I can look at cleaning up the image processing scripts i'm currently
using as a starting point.
nik