From Amirouche to ~sircmpwn/public-inbox
On 2021-02-24 04:56, Timmy Douglas wrote: > Amirouche <amirouche@hyper.dev> writes: > >> On 2021-02-23 06:14, Timmy Douglas wrote: >>> 1. Has anyone created a realistic alternative to HTML/CSS--maybe a >>> binary DOM that has a reasonable API for manipulation? I supposed >>> it'd >>> also need some sort of rendering engine >>> >> >> By DOM, you mean the HTML serialization or the tree-like structure >> that >> is in memory? >
From Amirouche to ~sircmpwn/public-inbox
On 2021-02-23 06:14, Timmy Douglas wrote: > I'm not very knowledgeable is far as GUI toolkits go and I'm not sure > exactly where I can post this question--but I thought this list might > be > one place to start. > > I've only used electron as an end user, not as a programmer--it seems > to > have gained traction as a popular web based GUI toolkit, but many > complain of the bloat generated by all the HTML/CSS/JavaScript > overhead. > > I was wondering about two things: >
From Amirouche to ~peacesearch/peacesearch-discuss
We discussed on IRC the possibility to have RFC documents. I am not sure yet how it will be structured. In the meantime, I made a list of possible topics that we may discuss: - Web Archive Feed: year-month-day incremental snapshots, that would be implemented by medium to large website owners that will allow search engines to index their website without crawling all the pages (related standard WARC, RSS, ATOM) - Sensimark: A category hierarchy similar to Wikipedia Vital Articles [0] and "fingerprinting" method that will allow both humans and robots to figure what topics are covered by a search engine or website (related tool: word2vec, BERT, classification) That could be at least two or
From Amirouche to ~peacesearch/peacesearch-discuss
I made that list for myself, maybe it can help other figure what to think when we write about search engines: - crawler e.g: apache nutch, possibly with headless browsers support and feeds (atom, rss, sitemap, robots.txt) - boolean keyword, full-text search to conceptual search - entity linking (aka. wikification) and disambiguations - one-way or two way synonyms and other lexicographic features - last few days weeks or month search (like google tools) - trends and trending topics (zscore) - alerting - domain search ie. restrict search to a given domain / subdomain - spell checking, transliteration and soundex - multi-lingual
From Amirouche to ~peacesearch/peacesearch-discuss
On 2021-01-10 08:05, Amirouche wrote: > If you stumble upon an interesting project with code that is related > to search engines do not hesitate to share it. The most ambitious project I know is: https://github.com/commonsearch The problems: - They rely on Elasticsearch which is resource hungry - They rely too much on micro-services, which puts a strain on deployment / accessibility - They use Python, JavaScript and Go: imo that is too much diversity in the stack
From Amirouche to ~peacesearch/peacesearch-discuss
If you stumble upon an interesting project with code that is related to search engines do not hesitate to share it.
From Amirouche to ~peacesearch/peacesearch-discuss
On 2021-01-10 07:58, Amirouche wrote: > If you find an interesting blog post dealing with problems related to > search engines do not hesitate to post it in this thread with a > comment. The most interesting blog post on search engine might be on Algolia blog. IMO, the two problems of Algolia approach are two sides: - They rely on C++ - They rely on the fact all the data fit in RAM - They rely on specific data structures (instead of an OKVS)
From Amirouche to ~peacesearch/peacesearch-discuss
If you find an interesting blog post dealing with problems related to search engines do not hesitate to post it in this thread with a comment.
From Amirouche to ~peacesearch/peacesearch-discuss
Hello! I'm a FLOSS search engine enthusiast (in complicity with several other people). It is been 10 years that I have been more or less researching the subject. That is around the time the term 'filter bubble' [0] pop up on the Internet, that it really started. This is also intellectual curiosity, and a lot of technical ambition: what is the most complex project or unsolved problem around? I just updated my blog at:
From Amirouche to ~sircmpwn/public-inbox
Hello, I stumbled upon your post [0], since I am working on a search engine myself I figured it was a good idea to shim in. The search engine is called: babelia. a) babelia instances will advertise their topical content via classifications such as the wikipedia vital articles hierarchy. Ok, this is not good enough for niche topics but it gives a heads up compared to randomly asking instances if they know about "those keywords" which is by the way not very great privacy wise. So instead, babelia try to guess the topic of the query, match them with one or more category, and given this