~geyaeb

G. Eyaeb

~geyaeb/haskell-pdftotext

Last active 1 year, 5 months ago

~geyaeb/streamly-binary

Last active 1 year, 10 months ago

~geyaeb/haskell-readability

Last active 1 year, 10 months ago
View more

Recent activity

[ANN] pdftotext 0.1.0.1 1 year, 5 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Version 0.1.0.1 of Haskell package pdftotext has been released.

This version fixes one bug which caused problems when multiple
PDFs were processed and makes pdftotext easier to build on MacOS.

Changelog:

 - Require C++11 standard
 - Prevent deletion of document pointer before its pages

https://hackage.haskell.org/package/pdftotext-0.1.0.1

Pdftotext extracts text from PDF using poppler.

Re: nondeterministic segfaults and another error 1 year, 5 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Hello!

Thanks for all your input which was very helpful.

Could you please try newly published version 0.1.0.1? It is on Hackage.
I believe it will solve the issue with segmentation faults.

https://hackage.haskell.org/package/pdftotext-0.1.0.1

Thank you.

Re: nondeterministic segfaults and another error 1 year, 5 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Hello, Erik.

Thanks for reporting.

I tried to read all the PDFs with pdftotext and poppler 20.11 on Linux but have not seen a segfault yet.

Could you please try the following?

1. If you `import Pdftotext.Internal`, you can use IO version of `pdftotext` function,
   i. e. `pdftotextIO`. Could you try using that instead of `pdftotext`?

2. Could you try `propertiesIO` or `pagesTotalIO` instead of `pdftotextIO`? This reads the
   PDF file but does not try to extract text, only some metadata.

Re: how to install pdftotext on osx with stack 1 year, 5 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Thanks a lot for the links.

Well, it seems that it is somewhere deep inside GHC. I am afraid
there is not much I can do besides adding some note into README.
So that's what I will do.

Thanks again and please let me know if there is something I could
improve in the library.

G.

Re: how to install pdftotext on osx with stack 1 year, 5 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Hello, Erik.

Unfortunately I cannot try on MacOS but after some searching and
testing, it seems that poppler requires to be compiled with standard
C++11 while your compiler does not use it by default.

Could you try to clone the project, modify .cabal and then point stack
dependency to the local version?

> hg clone https://hg.sr.ht/~geyaeb/haskell-pdftotext
> cd haskell-pdftotext

Edit pdftotext.cabal by adding "-std=c++11" to "cc-options".
The line now looks like:

[ANN] readability 0.1.0.0 1 year, 10 months ago

From G. Eyaeb to ~geyaeb/haskell-readability

Version 0.1.0.0 of Haskell package readability has been released.

Changelog:

 - Added reading of data from standard input
 - Added option extract to command line to choose what to print
 - Added extraction of title and short title

https://hackage.haskell.org/package/readability-0.1.0.0

Readability extracts text of main article from HTML document.

[ANN] readability 0.0.1.0 1 year, 10 months ago

From G. Eyaeb to ~geyaeb/haskell-readability

Initial version 0.0.1.0 of Haskell package readability has been released.

https://hackage.haskell.org/package/readability-0.0.1.0

Readability extracts text of main article from HTML document.

[ANN] pdftotext 0.1.0.0 1 year, 11 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Version 0.1.0.0 of Haskell package pdftotext has been released.

Changelog:

 - Added executable 'pdftotext.hs'
 - Removed 'xml-conduit' flag again (sorry, I realized it was a bad idea)

https://hackage.haskell.org/package/pdftotext-0.1.0.0

[ANN] pdftotext 0.0.2.0 1 year, 11 months ago

From G. Eyaeb to ~geyaeb/haskell-pdftotext

Version 0.0.2.0 of Haskell package pdftotext has been released.

Changelog:

 - Added PDF document properties (author, title etc.)
 - Added flag xml-conduit (parse metadata using xml-conduit)

https://hackage.haskell.org/package/pdftotext-0.0.2.0