From G. Eyaeb to ~geyaeb/haskell-pdftotext
Version 0.1.0.1 of Haskell package pdftotext has been released. This version fixes one bug which caused problems when multiple PDFs were processed and makes pdftotext easier to build on MacOS. Changelog: - Require C++11 standard - Prevent deletion of document pointer before its pages https://hackage.haskell.org/package/pdftotext-0.1.0.1 Pdftotext extracts text from PDF using poppler.
From G. Eyaeb to ~geyaeb/haskell-pdftotext
Hello! Thanks for all your input which was very helpful. Could you please try newly published version 0.1.0.1? It is on Hackage. I believe it will solve the issue with segmentation faults. https://hackage.haskell.org/package/pdftotext-0.1.0.1 Thank you.
From G. Eyaeb to ~geyaeb/haskell-pdftotext
Hello, Erik. Thanks for reporting. I tried to read all the PDFs with pdftotext and poppler 20.11 on Linux but have not seen a segfault yet. Could you please try the following? 1. If you `import Pdftotext.Internal`, you can use IO version of `pdftotext` function, i. e. `pdftotextIO`. Could you try using that instead of `pdftotext`? 2. Could you try `propertiesIO` or `pagesTotalIO` instead of `pdftotextIO`? This reads the PDF file but does not try to extract text, only some metadata.
From G. Eyaeb to ~geyaeb/haskell-pdftotext
Thanks a lot for the links. Well, it seems that it is somewhere deep inside GHC. I am afraid there is not much I can do besides adding some note into README. So that's what I will do. Thanks again and please let me know if there is something I could improve in the library. G.
From G. Eyaeb to ~geyaeb/haskell-pdftotext
Hello, Erik. Unfortunately I cannot try on MacOS but after some searching and testing, it seems that poppler requires to be compiled with standard C++11 while your compiler does not use it by default. Could you try to clone the project, modify .cabal and then point stack dependency to the local version? > hg clone https://hg.sr.ht/~geyaeb/haskell-pdftotext > cd haskell-pdftotext Edit pdftotext.cabal by adding "-std=c++11" to "cc-options". The line now looks like:
From G. Eyaeb to ~geyaeb/haskell-readability
Version 0.1.0.0 of Haskell package readability has been released. Changelog: - Added reading of data from standard input - Added option extract to command line to choose what to print - Added extraction of title and short title https://hackage.haskell.org/package/readability-0.1.0.0 Readability extracts text of main article from HTML document.
From G. Eyaeb to ~geyaeb/haskell-readability
Initial version 0.0.1.0 of Haskell package readability has been released. https://hackage.haskell.org/package/readability-0.0.1.0 Readability extracts text of main article from HTML document.
From G. Eyaeb to ~geyaeb/haskell-pdftotext
Version 0.1.0.0 of Haskell package pdftotext has been released. Changelog: - Added executable 'pdftotext.hs' - Removed 'xml-conduit' flag again (sorry, I realized it was a bad idea) https://hackage.haskell.org/package/pdftotext-0.1.0.0
From G. Eyaeb to ~geyaeb/haskell-pdftotext
Version 0.0.2.0 of Haskell package pdftotext has been released. Changelog: - Added PDF document properties (author, title etc.) - Added flag xml-conduit (parse metadata using xml-conduit) https://hackage.haskell.org/package/pdftotext-0.0.2.0