~peacesearch/peacesearch-discuss

Possible RFC topics

Details
Message ID
<84a6c46447eaba49df5922f4ceffbab3@hyper.dev>
DKIM signature
missing
Download raw message
We discussed on IRC the possibility to have RFC documents. I am not sure 
yet how it will be structured.

In the meantime, I made a list of possible topics that we may discuss:

- Web Archive Feed: year-month-day incremental snapshots, that would be 
implemented by medium to large website owners that will allow search 
engines to index their website without crawling all the pages (related 
standard WARC, RSS, ATOM)

- Sensimark: A category hierarchy similar to Wikipedia Vital Articles 
[0] and "fingerprinting" method that will allow both humans and robots 
to figure what topics are covered by a search engine or website (related 
tool: word2vec, BERT, classification) That could be at least two or 
three documents, 1) One for the first or first two level categories, 2) 
Another for the fingerprinting method 3) And yet another one, for really 
nice topics e.g. "black and white manga between 1950 and 1980" which 
will be difficult to document for all niches and topics.

- Search Query Forwarding: explain how and in what circonstencies a 
search engine can request another search engine to answer a query. More 
generally, how resource sharing works.

- Web Archive Exchange: Search Engines may make public their own web 
crawl results so that other search engines may use them to seed their 
index or for use in full scans (deep search).

- Peer Archive eXchange & Discovery, similar to bittorrent PEX but for 
search engines.


[0] https://en.wikipedia.org/wiki/Wikipedia:Vital_articles, see [1]
[1] https://cloud.google.com/natural-language/docs/categories
Reply to thread Export thread (mbox)