~peacesearch/peacesearch-discuss

searchmysite.net now with added Wikipedia goodness

Details
Message ID
<zdFf_wsaDpLt-3LC7bMqFizw8QUJwmSYNLYPQwbb_yE7Thx_IDWYIDyU-HIvC8vzCKyWv-e-t49VWxbWKkxDGdocyY41s8SDc82f57muPAc=@hyper.dev>
DKIM signature
missing
Download raw message
> So I’ve finally managed to index Wikipedia, or at least
> the 6,392,807 English language pages.
>
> Some of the benefits this brings to searchmysite.net:
>
> - It turns it into a much more useful search engine
> for day-to-day usage. Many of my internet searches
> in the past have simply ended with clicks to Wikipedia,
> so now when I’m performing that sort of search I can use
> searchmysite.net to get the Wikipedia link and see
> if there are any other personal or independent sites
> which have anything interesting to say on the topic.
> It could still benefit from users submitting more good
> quality personal and independent websites for indexing1,
> and some other changes such as extending its relevancy tuning,
> but it is definitely showing promise.
>
> - It shows that the system can handle nearly 6,500,000 documents,
> even on a single relatively low spec server. As an aside, this is
> nearly a quarter of the size of the first Google index in 19982.
>
> - The mechanism for allowing the indexing process to differ on a
> site-by-site basis opens up the possibility of implementing
> additional custom indexing processes for other sites. Maybe,
> being open source, people could even contribute their own in future.
>
> BTW, if someone wants to try out the Wikipedia import,
> they can simply spin up a searchmysite instance using
> the 8 commands listed in the README.md, and then run import.sh
> via docker exec -it src_indexing_1 /usr/src/app/bulkimport/wikipedia/import.sh.
>
> This post has some more technical information on the indexing
> of Wikipedia for searchmysite.net.

https://blog.searchmysite.net/posts/searchmysite.net-now-with-added-wikipedia-goodness/
Reply to thread Export thread (mbox)