~sircmpwn/sr.ht-discuss

7 6

search.sr.ht idea

Details
Message ID
<0550289cf37ffab2b2f367bbdbfb8d0f@hyper.dev>
DKIM signature
missing
Download raw message
The idea is to build search.sr.ht service that would allow project
to setup search for their project hosted on source hut.

The goal of search is to find information that is hidden or not easy
to find. In the context of sr.ht, a project is built out of several
independent services repository, list, todo, and wiki: going through
the search box of those services requires multiple clicks.

My idea is to build a search service following the philosophy of
sr.ht of independent services that will have its own API that may
be triggered from a build job. list and todo will require specific
hooks.

Regarding the implementation, I investigated several solutions, it
seems to me in the case of sr.ht it will be better to reuse postgresql,
but not necessarly rely on postgresql tsearch2 / fts extension.

My current plan is to go with python uvicorn, asyncpg and rely on
multiprocessing to speed up the queries. Most of the time is spent
in aho-corasick to score documents, that be speed up significantly
with pypy. In any case, it requires some kind of parallelism to be
able to scale.

-- 
Amirouche ~ https://hyper.dev ~ https://peacesear.ch
Details
Message ID
<20210425213408.86ee5f605b9b0085cce906b2@firemail.cc>
In-Reply-To
<0550289cf37ffab2b2f367bbdbfb8d0f@hyper.dev> (view parent)
DKIM signature
missing
Download raw message
On Sun, 25 Apr 2021 15:50:05 +0200
Amirouche <amirouche@hyper.dev> wrote:

> The idea is to build search.sr.ht service that would allow project
> to setup search for their project hosted on source hut.

Very good sugestion
-- 
att, pekman

       *     *
        \ o /
         \|/ 
          |               
         / \  _  
        /   \/
       /
      -
Details
Message ID
<20210426024746.oftenzjtxtq2hooy@samwise>
In-Reply-To
<0550289cf37ffab2b2f367bbdbfb8d0f@hyper.dev> (view parent)
DKIM signature
missing
Download raw message
On Sun, Apr 25, 2021 at 03:50:05PM +0200, Amirouche wrote:
> The idea is to build search.sr.ht service that would allow project
> to setup search for their project hosted on source hut.
> 
> The goal of search is to find information that is hidden or not easy
> to find. In the context of sr.ht, a project is built out of several
> independent services repository, list, todo, and wiki: going through
> the search box of those services requires multiple clicks.

Is your idea to allow search within a project, between projects, or
both?  I find it hard to imagine where it would be useful to have
between-project search except in one very specific way: searching all
public Sourcehut projects' descriptions for a particular string or tags
for a particular tag.  The thing is, this already exists!

I am more optimistic about search within a project.  Searching the
codebase of a particular project (git-grep(1)-style) is something that
would be useful.  It would also be useful if this search included
patches submitted to the project's mailing list.  But if I am searching
code I want to search code and code only.  Searching GitHub for a
particular function name and getting references to things that aren't
code is frustrating.  While it might be occasionally useful, it is a bad
default.

Searching a project's mailing list/bug tracker is also something that
would be useful.  If I have an issue with something, I want to be able
to tell if someone else has had that issue before so I do not create
duplicate bugs, and in case they've already solved it.

> My idea is to build a search service following the philosophy of
> sr.ht of independent services that will have its own API that may
> be triggered from a build job. list and todo will require specific
> hooks.

I'm not intimately familiar with the architecture of sourcehut, so I'm
sorry if this is a silly question: what do you mean by this?  Do you
mean that search is something you would have to configure for your
project by manually hooking up different services to the search service?
Or do you mean that under the hood it would be implemented this way?

Project owners needing to manually configure search beyond a simple
checkbox saying "Yes I want my {mailing list, bug tracker, codebase,
mailing list patches} to be searchable" would be poor usability IMO.
Details
Message ID
<71066794ba5953647c95c367f230df33@hyper.dev>
In-Reply-To
<20210426024746.oftenzjtxtq2hooy@samwise> (view parent)
DKIM signature
missing
Download raw message
On 2021-04-26 04:47, Miles Rout wrote:
> On Sun, Apr 25, 2021 at 03:50:05PM +0200, Amirouche wrote:
>> The idea is to build search.sr.ht service that would allow project
>> to setup search for their project hosted on source hut.
>> 
>> The goal of search is to find information that is hidden or not easy
>> to find. In the context of sr.ht, a project is built out of several
>> independent services repository, list, todo, and wiki: going through
>> the search box of those services requires multiple clicks.
> 
> Is your idea to allow search within a project, between projects, or
> both?  I find it hard to imagine where it would be useful to have
> between-project search except in one very specific way: searching all
> public Sourcehut projects' descriptions for a particular string or tags
> for a particular tag.  The thing is, this already exists!

My personal use-case is to search across projects, including mailing 
list,
todo, git, hg, and also paste service. The primary reason is to 
alleviate
the need for privateer search engines (that do not work well with my
favorite programming language namely: Scheme).

I regularly search across projects on github to know how a particular
feature can be used such as m4 macros, or even example webpack
configuration etc...

> I am more optimistic about search within a project.

That is what I was said on IRC too.

> Searching the
> codebase of a particular project (git-grep(1)-style) is something that
> would be useful.  It would also be useful if this search included
> patches submitted to the project's mailing list.  But if I am searching
> code I want to search code and code only.  Searching GitHub for a
> particular function name and getting references to things that aren't
> code is frustrating.  While it might be occasionally useful, it is a 
> bad
> default.

That may be the default or fine-tuned with a bang in the query such as 
!code.

My preference for the default would be to look for everything, and allow 
the
user to specify they want only the code.

If the search is done in project page, it would search in the current 
project.
Except if a bare bang is provided.


> 
> Searching a project's mailing list/bug tracker is also something that
> would be useful.  If I have an issue with something, I want to be able
> to tell if someone else has had that issue before so I do not create
> duplicate bugs, and in case they've already solved it.
> 

Exactly. At the moment you need to search the mailing list(s) and the 
tracker(s).

>> My idea is to build a search service following the philosophy of
>> sr.ht of independent services that will have its own API that may
>> be triggered from a build job. list and todo will require specific
>> hooks.
> 
> I'm not intimately familiar with the architecture of sourcehut, so I'm
> sorry if this is a silly question: what do you mean by this?  Do you
> mean that search is something you would have to configure for your
> project by manually hooking up different services to the search 
> service?
> Or do you mean that under the hood it would be implemented this way?

That is an important usability aspect. I am not sure how the checkbox
is feasible given sourcehut code, since I am not a sourcehut expert yet.

> Project owners needing to manually configure search beyond a simple
> checkbox saying "Yes I want my {mailing list, bug tracker, codebase,
> mailing list patches} to be searchable" would be poor usability IMO.

The idea here is at least to offer an API that may be used from build
service. Integration with other services is an open question.

-- 
Amirouche ~ https://hyper.dev
Details
Message ID
<20210426131030.2pewjsl2otjj2t3i@localhost.localdomain>
In-Reply-To
<20210426024746.oftenzjtxtq2hooy@samwise> (view parent)
DKIM signature
missing
Download raw message
On 26.04.2021 14:47, Miles Rout wrote:
> 
> Searching a project's mailing list/bug tracker is also something that
> would be useful.

Doesn't that feature already there on sourcehut?

-- 
Regards,
Ngô Ngọc Đức Huy
Details
Message ID
<CAXOM5VC0EFJ.198JFL1JPLKP1@debian>
In-Reply-To
<20210426131030.2pewjsl2otjj2t3i@localhost.localdomain> (view parent)
DKIM signature
missing
Download raw message
Yes and no.  A project may have multiple mailing lists
and ticket trackers and a combined search for all of them
would be really convenient.

I suspect that such feature would be blocked by the GraphQL works
(which is due by the beta release) so I wouldn't hold my breath though.
Details
Message ID
<20210426133414.uqyxliy2nuznwfw6@phi>
In-Reply-To
<0550289cf37ffab2b2f367bbdbfb8d0f@hyper.dev> (view parent)
DKIM signature
missing
Download raw message
> My idea is to build a search service following the philosophy of
> sr.ht of independent services that will have its own API that may
> be triggered from a build job. list and todo will require specific
> hooks.

Why not implement it as a feature of hub.sr.ht first? That's where
projects are grouped - I wonder if yet another service is not a bit
overkill / premature. Do we need arbitrary search across arbitrary
groups of projects?

-- 
Timothée
Details
Message ID
<7ea4abb13293bb14732aeb5a03e687fa@hyper.dev>
In-Reply-To
<20210426133414.uqyxliy2nuznwfw6@phi> (view parent)
DKIM signature
missing
Download raw message
On 2021-04-26 15:34, Timothée Floure wrote:
>> My idea is to build a search service following the philosophy of
>> sr.ht of independent services that will have its own API that may
>> be triggered from a build job. list and todo will require specific
>> hooks.
> 
> Why not implement it as a feature of hub.sr.ht first? That's where
> projects are grouped - I wonder if yet another service is not a bit
> overkill / premature. Do we need arbitrary search across arbitrary
> groups of projects?

The initial rational is because it rely on asyncio, hence it would
have been a service anyway, I understand I does not have to be exposed
in the gui.

Anyway that may be overkill, and I figured a general purpose search 
engine
will allow site wide search with query terms such as site:sr.ht or even
site:*.sr.ht. It will also be possible to do project specific search
with proper query terms.

There is several reasons I thought about sourcehut and building
it as sourcehut service:

- I thought it would allow better integration with sourcehut types
   such as issues, code, wiki, paste etc... and enable faceted queries.
   But that can eventually be implemented on top of general purpose
   search engine;

- I wanted to have more feedback on my general purpose search engine
   design;

- I do not have much motivation nowadays to write python, but 
contributing
   to sourcehut is motivating;

- I also need search as part of an instance of sourcehut I am setting 
up!


Summary: a general purpose search engine would be good enough!


If I end up coding something useful, you will know it somehow :)


Thanks!
Reply to thread Export thread (mbox)