> So I’ve finally managed to index Wikipedia, or at least
> the 6,392,807 English language pages.
>
> Some of the benefits this brings to searchmysite.net:
>
> - It turns it into a much more useful search engine
> for day-to-day usage. Many of my internet searches
> in the past have simply ended with clicks to Wikipedia,
> so now when I’m performing that sort of search I can use
> searchmysite.net to get the Wikipedia link and see
> if there are any other personal or independent sites
> which have anything interesting to say on the topic.
> It could still benefit from users submitting more good
> quality personal and independent websites for indexing1,
> and some other changes such as extending its relevancy tuning,
> but it is definitely showing promise.
>
> - It shows that the system can handle nearly 6,500,000 documents,
> even on a single relatively low spec server. As an aside, this is
> nearly a quarter of the size of the first Google index in 19982.
>
> - The mechanism for allowing the indexing process to differ on a
> site-by-site basis opens up the possibility of implementing
> additional custom indexing processes for other sites. Maybe,
> being open source, people could even contribute their own in future.
>
> BTW, if someone wants to try out the Wikipedia import,
> they can simply spin up a searchmysite instance using
> the 8 commands listed in the README.md, and then run import.sh
> via docker exec -it src_indexing_1 /usr/src/app/bulkimport/wikipedia/import.sh.
>
> This post has some more technical information on the indexing
> of Wikipedia for searchmysite.net.
https://blog.searchmysite.net/posts/searchmysite.net-now-with-added-wikipedia-goodness/