~geb/numen

2 2

Suggestions for transcriptions: realtime, stay in transcription mode, punctuation, correct mistakes

Details
Message ID
<170860820834.18331.7773940513043182724@localhost>
DKIM signature
missing
Download raw message
I tried out numen to chat a few days ago, while my fingers were hurting from climbing too hard.

I'd like to share a few observations about the transcription mode and then my proposals for how it
could be improved!

Observations
- You can't (afaics) easily correct mistakes in transcribed text, not while dictating and it's
  clumsy afterwards.
- You can't (afaics) add punctuation while transcribing: you have to wait until you leave
  transcription mode and then add your punctuation.
- You have to keep speaking to keep the transcription mode active, which is stressing.
- When you start speaking, you don't see the transcription until after you stop speaking.

Suggestions
- Wouldn't it be nice to see the words come in as you speak? When the model changes its mind,
  backspace can be pressed. I know I make it sound simpler then it is, but I think this would be a
  big improvement.
- Live closed captions for TV are often created with voice recognition. There the system changes
  "comma" into a comma symbol for instance. I don't know how they deal with it when someone is
  speaking *about* a comma, but I imagine an escape word or sound could work nicely for that.
  The other way around is also possible: requiring an escape word or sound to recognize a single
  non-transcription command.
- Have the option to keep scribe active until you terminate it. (It could be a command like what I
  suggest in the previous bullet point.)

Finally: nice work, I hope this project can keep bringing you fulfillment, and I hope to find some
time to contribute one day!
Details
Message ID
<CZBTM8G4CIT0.1XV3LPSPDYZ87@johngebbie.com>
In-Reply-To
<170860820834.18331.7773940513043182724@localhost> (view parent)
DKIM signature
pass
Download raw message
Thanks midgard, appreciate the write-up! I've messed about with fancier
transcription a bit with sprec (https://git.sr.ht/~geb/sprec) but I'm
still deciding how to go about things for numen.

I hope so too :) and it's great you found numen useful.
Careful climbing!
Details
Message ID
<170873039588.8776.12757901169671162396@localhost>
In-Reply-To
<CZBTM8G4CIT0.1XV3LPSPDYZ87@johngebbie.com> (view parent)
DKIM signature
missing
Download raw message
Nice! I made a quick and dirty proof of concept of my ideas with Python and sprec. It's already
working quite well actually!

https://git.sr.ht/~midgard/sprec_dictation
Reply to thread Export thread (mbox)