Jump to content
  • entries
    16
  • comments
    104
  • views
    11,987

Machine Learning-based Voice ITC Translator Software now available


Michael Lee

2,584 views

Since early 2019, I have been working on software to extract voices from physical noise/signals. My earliest attempts used other people's software, mainly an algorithm called "spectral subtraction." in a ReaFir noise reduction plugin. This converts the noise into the frequency spectrum, where slight imprints of voice can be discovered and emphasized.

We now enter the year 2022 - Spectral subtraction is still a very valuable tool, but it is only the beginning of a process I've developed for extracting voices. I've created machine-learning-based models to find and emphasize voices. I've also made a program that finds and generates "formants" or peaks in the harmonic buzz of the human voice.

I'm finally releasing my full software, in Python. I use a very similar version of) this code in all of my experiments (FPGAs, radio noise, etc.)

I would've liked to have shared it as an executable, like I did Spiricam, but Python executable-makers are notoriously buggy. Another reason I've hesitated is=n sharing the code sooner is that it used to require some heavy GPU resources. However, thanks to some software developments by Google, my ML models seem to run OK on the CPU pretty well in real-time.

image.png.a7e3d14ea5361eb22dbdecd1da31cd9d.png

So if you want to try out my code, you'll have to do some command-line steps and you'll have to at minimum install a free program called Miniconda, or a larger version called Anaconda with Python version 3.8, 64-bit. Maybe a few GBs of disk storage will be required.

Here's the link to the code: https://drive.google.com/drive/folders/1fu6hAuE0AbhbQjx0Ts_3Ju0QRJ0awxRM?usp=sharing

In the directory is a README.txt, which I'll update as we iron out the instructions.

When I've resolved most of the common issues, I'll make the code into a ZIP file for the Downloads sections.

For now, feel free to ask questions in the comments. As I like to say "The spirits are waiting!"

42 Comments


Recommended Comments



Manuals? Sounds like another job we'll need to outsource. 😉

Ok, if the program is running, that's good. I'm a little concerned that you are not hearing yourself. That tells me the input device might not be the right one yet. If you move all the sliders to the left, and uncheck "Machine Learning" the output should be close to the raw input (delayed by 2.5 seconds). 

When you do get the microphone picking up, the best results seem to occur when you have some sort of noise or tones in the background. Complete silence will lead to much fewer "blips" of voice.

Stepping back to the general "path of executables" like conda problem. I don't usually have a problem when I run within an Anaconda prompt/shell. The shell, I'm guessing, sets all of the path variables for "conda", "python", "spyder" etc.

Regarding saving, the way I have it setup is that it stores all of the output from the beginning, and saves to a file, when "Exit." The save location can be modified by modifying the main Python script near the end. I know it's a little clunky. This is the one program I haven't fully made "user-friendly."  

Link to comment

I'm trying to get some quotes on porting this to Javascript or Visual C++. Freelancers tell me they can't quote unless they see the source code. They say they can extract it, but why make them go through that? If you send me the source code, I can pass it along for quotes. Or is that too risky? Or maybe I should just get a quote based on having them install and look at the app, without giving the source code? 

In conversation with one programmer, he said that Transcrypt and similar conversion utilities would not be up to the task. He said webassembly was the way to go, and it would just be one file (Javascript) that could be used on all platforms. 

Even if we got a good price, we'd have to keep going back to the freelancer to make changes each time you modify the program. Unless they quote an ongoing upkeep rate, I guess. 

Link to comment
1 hour ago, Michael Lee said:

Manuals? Sounds like another job we'll need to outsource. 😉

Ok, if the program is running, that's good. I'm a little concerned that you are not hearing yourself. That tells me the input device might not be the right one yet. If you move all the sliders to the left, and uncheck "Machine Learning" the output should be close to the raw input (delayed by 2.5 seconds). 

When you do get the microphone picking up, the best results seem to occur when you have some sort of noise or tones in the background. Complete silence will lead to much fewer "blips" of voice.

Stepping back to the general "path of executables" like conda problem. I don't usually have a problem when I run within an Anaconda prompt/shell. The shell, I'm guessing, sets all of the path variables for "conda", "python", "spyder" etc.

Regarding saving, the way I have it setup is that it stores all of the output from the beginning, and saves to a file, when "Exit." The save location can be modified by modifying the main Python script near the end. I know it's a little clunky. This is the one program I haven't fully made "user-friendly."  

I tried bringing up the prompt three different ways. None worked, and I couldn't figure what properties needed to be changed in the batch file. So I just did a manual CD to get into the right directory before continuing. 

I did eventually learn how to save output files (read my previous comments) but only after googling proper syntax for the naming of directory paths. 

It would be helpful to wrap this all up in an exe file to simplify installation. Fernando said he has the tool. 

No need for a full-blown manual yet. But some additions to the Readme.txt based on user feedback would be useful. 

I'll try your other suggestions. 

Link to comment

Today I reinstalled python. I trashed it while trying to start your app by creating a direct link from my desktop. Reinstalled also the tf25_nongpu environment. It worked smoothly so the installation process is stable due to my experience.

I'm actually testing your app with phototransistor noise sources. One setup I'm working on seems to be fairly promising. In my tests I found out the importance of the low threshold and the tone threshold parameter. I really could tweak your sw by carefully adjusting them. By the way, the automatic storing and retrieving of parameter settings is really nice!

Moreover I am curious if I could create a link to start your app inside the anaconda prompt by using windows PowerShell.

 

Link to comment

I'm happy to read that the code is working for you.

I'm busy working on an improved detoning model. The script doesn't change much, if at all, but the goal is to analyze a half-second of signal and figure out the closest true speech (tm) analogue. The hard part is getting the output to sound "clean" - even if it is an imperfect guess.

A clickable link to start-up the script would be nice. I never thought of it for this code, because I'm always tweaking it in the Spyder editor.

 

Link to comment
2 hours ago, Andres Ramos said:

Today I reinstalled python. I trashed it while trying to start your app by creating a direct link from my desktop. Reinstalled also the tf25_nongpu environment. It worked smoothly so the installation process is stable due to my experience.

I'm actually testing your app with phototransistor noise sources. One setup I'm working on seems to be fairly promising. In my tests I found out the importance of the low threshold and the tone threshold parameter. I really could tweak your sw by carefully adjusting them. By the way, the automatic storing and retrieving of parameter settings is really nice!

Wow that sounds like a very sophisticated improvement. I'm curious! In the meantime I will try to squeeze out what is possible from your code after I finished the latest phototransistor design before I will continue with Sonias Lightbridge device manufacturing.

If I should find an easy to use solution for a one click start I will let you know.

Link to comment

Bonjour Andres

Personnellement, je n'entends pas la même chose (Je suppose que c'est juste de la paréidolie auditive ce que j'entends) 😉

Fast blinken / me: "Sortez vos propres boîtiers

My woman / On voudrais du .... (je ne comprends pas la fin)

One hole / Porter un coups de crosse

Too much is harmful / Bonne chance à côté

Too much / Macho ou Macha

Link to comment
3 hours ago, Alain said:

Bonjour Andres

Personnellement, je n'entends pas la même chose (Je suppose que c'est juste de la paréidolie auditive ce que j'entends) 😉

Fast blinken / me: "Sortez vos propres boîtiers

My woman / On voudrais du .... (je ne comprends pas la fin)

One hole / Porter un coups de crosse

Too much is harmful / Bonne chance à côté

Too much / Macho ou Macha

Interesting Alain. I have the same problem the other eay around and can't hear what you heard. There is a theory we sometimes discuss in the research team that our perception is the last stage where the information becomes finalized. Only after our perception the information transfer is completed. This means that the same recording can manifest into different messages in the ears of different people. Sometimes I think this is true because the perceptions are so different.

Link to comment

Je suppose que cela parle de votre appareil, il y a certainement des améliorations à faire pour mieux les entendre 😉

Link to comment

Is it possible to add some screenshots of the steps involved?  Or, maybe a screen recording of the setup process??

I apologize, but I'm not too savvy with "code speak" and I'd love to check out what you've created!

Best,

Mike

Link to comment

Mike,

Thank you for your interest in trying out the software!

This weekend, I'll first try again to turn this software into an "executable" like I did with Spiricam, which makes it super easy for anyone to download and use. If that doesn't work, I'll start working on explaining the install process in more detail with some pictures or maybe a video.

-michael

Link to comment
On 10/14/2022 at 7:36 AM, Michael Lee said:

Thank you for your interest in trying out the software!

This weekend, I'll first try again to turn this software into an "executable" like I did with Spiricam, which makes it super easy for anyone to download and use. If that doesn't work, I'll start working on explaining the install process in more detail with some pictures or maybe a video.

Many, many, many thanks for utilizing your abilities for this community at large!

I've been dying to learn my first coding language (Python), but I just haven't found a path that's right for me yet.

But, I have this ITC project idea that I'd love to begin developing that utilizes the sending/receiving of MIDI signals for programming all the control variables (whatever those end up becoming).  For some reason, it seems like no one wants to explore the topic of MIDI signaling/programming in the Audio Electronics-ITC space yet. 

Thanks again,

Mike

P.s. - let me know if any of that interests you ^^

Link to comment
On 10/18/2022 at 10:45 PM, M1K3_MM said:

Many, many, many thanks for utilizing your abilities for this community at large!

I've been dying to learn my first coding language (Python), but I just haven't found a path that's right for me yet.

But, I have this ITC project idea that I'd love to begin developing that utilizes the sending/receiving of MIDI signals for programming all the control variables (whatever those end up becoming).  For some reason, it seems like no one wants to explore the topic of MIDI signaling/programming in the Audio Electronics-ITC space yet. 

Thanks again,

Mike

P.s. - let me know if any of that interests you ^^

hello there Mike... sorry about not knowing this tech, but sounds nice!!!

Link to comment

Michael, if I may suggest, why don't you schedule a roadshow to present to us the application you've developed?!?

It would be really nice!!!!

Thanks!

Link to comment
  • Administrators

Hi Mike, regarding the MIDI experiment - usually the people who have the knowledge of certain technologies apply that knowledge to ITC experiments.

This means in most cases there simply aren't enough technical people to fulfill every request - particularly as everyone has their own competing schedules and goals.

I would suggest maybe sketching out how you see this experiment going, and then see if others have input 🙂

Keith

Link to comment

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.