~amirouche

https://hyper.dev

~amirouche/public-inbox

Last active a month ago
View more

Recent activity

Re: electron alternatives: wayland-like widget toolkits/GUI frameworks 7 days ago

From Amirouche to ~sircmpwn/public-inbox

On 2021-02-24 04:56, Timmy Douglas wrote:
> Amirouche <amirouche@hyper.dev> writes:
> 
>> On 2021-02-23 06:14, Timmy Douglas wrote:
>>> 1. Has anyone created a realistic alternative to HTML/CSS--maybe a
>>> binary DOM that has a reasonable API for manipulation? I supposed 
>>> it'd
>>> also need some sort of rendering engine
>>> 
>> 
>> By DOM, you mean the HTML serialization or the tree-like structure 
>> that
>> is in memory?
> 

Re: electron alternatives: wayland-like widget toolkits/GUI frameworks 7 days ago

From Amirouche to ~sircmpwn/public-inbox

On 2021-02-23 06:14, Timmy Douglas wrote:
> I'm not very knowledgeable is far as GUI toolkits go and I'm not sure
> exactly where I can post this question--but I thought this list might 
> be
> one place to start.
> 
> I've only used electron as an end user, not as a programmer--it seems 
> to
> have gained traction as a popular web based GUI toolkit, but many
> complain of the bloat generated by all the HTML/CSS/JavaScript 
> overhead.
> 
> I was wondering about two things:
> 

Possible RFC topics a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

We discussed on IRC the possibility to have RFC documents. I am not sure 
yet how it will be structured.

In the meantime, I made a list of possible topics that we may discuss:

- Web Archive Feed: year-month-day incremental snapshots, that would be 
implemented by medium to large website owners that will allow search 
engines to index their website without crawling all the pages (related 
standard WARC, RSS, ATOM)

- Sensimark: A category hierarchy similar to Wikipedia Vital Articles 
[0] and "fingerprinting" method that will allow both humans and robots 
to figure what topics are covered by a search engine or website (related 
tool: word2vec, BERT, classification) That could be at least two or

Possible features for a search engine a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

I made that list for myself, maybe it can help other figure what to 
think when we write about search engines:

     - crawler e.g: apache nutch, possibly with headless browsers support 
and feeds (atom, rss, sitemap, robots.txt)
     - boolean keyword, full-text search to conceptual search
     - entity linking (aka. wikification) and disambiguations
     - one-way or two way synonyms and other lexicographic features
     - last few days weeks or month search (like google tools)
     - trends and trending topics (zscore)
     - alerting
     - domain search ie. restrict search to a given domain / subdomain
     - spell checking, transliteration and soundex
     - multi-lingual

Re: Search Engines by the way of code a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

On 2021-01-10 08:05, Amirouche wrote:
> If you stumble upon an interesting project with code that is related
> to search engines do not hesitate to share it.

The most ambitious project I know is: https://github.com/commonsearch

The problems:

- They rely on Elasticsearch which is resource hungry
- They rely too much on micro-services, which puts a strain on 
deployment / accessibility
- They use Python, JavaScript and Go: imo that is too much diversity in 
the stack

Search Engines by the way of code a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

If you stumble upon an interesting project with code that is related to 
search engines do not hesitate to share it.

Re: Search Engines by the way of blogs a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

On 2021-01-10 07:58, Amirouche wrote:
> If you find an interesting blog post dealing with problems related to
> search engines do not hesitate to post it in this thread with a
> comment.

The most interesting blog post on search engine might be on Algolia 
blog.

IMO, the two problems of Algolia approach are two sides:

- They rely on C++
- They rely on the fact all the data fit in RAM
- They rely on specific data structures (instead of an OKVS)

Search Engines by the way of blogs a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

If you find an interesting blog post dealing with problems related to 
search engines do not hesitate to post it in this thread with a comment.

Introduction a month ago

From Amirouche to ~peacesearch/peacesearch-discuss

Hello!


I'm a FLOSS search engine enthusiast (in complicity with several other 
people).

It is been 10 years that I have been more or less researching the 
subject. That is around the time the term 'filter bubble' [0] pop up on 
the Internet, that it really started. This is also intellectual 
curiosity, and a lot of technical ambition: what is the most complex 
project or unsolved problem around?

I just updated my blog at:

We can do better than DuckDuckGo [et al.] 3 months ago

From Amirouche to ~sircmpwn/public-inbox

Hello,


I stumbled upon your post [0], since I am working on a search engine
myself I figured it was a good idea to shim in.  The search engine is
called: babelia.

   a) babelia instances will advertise their topical content via
      classifications such as the wikipedia vital articles
      hierarchy. Ok, this is not good enough for niche topics but it
      gives a heads up compared to randomly asking instances if they
      know about "those keywords" which is by the way not very great
      privacy wise.  So instead, babelia try to guess the topic of the
      query, match them with one or more category, and given this