Jump to content

The Basics of Voice Reproduction & Your Help Requested


Recommended Posts

  • Administrators

Hello everyone.

As I continue the work that began 16 years ago I've decided to share some basics with you. The hope is that you will think of some creative solutions to the principles outlined below which are the result of my experiments over a long period of time. With your creativity I believe we can crack this long-term project to achieve at least one-way real-time voice communication.

We've been successful in creating voice through many various methods and what we lack is a unified approach based upon these principles. I'm hoping for your creative solutions in response to the principles I mention. Please consider that I don't have the technical experience most of you have, my solutions are a combination of intuition and trial and error. So you may need to translate into your own understanding when thinking of a potential solution. The goal here, as I present it, is to mimic human voice. From there, it will be a piece of cake.

    1. Voice requires a fundamental frequency with a fundamental frequency amplitude louder than the rest of the harmonics

    2. Additional harmonics need to be shaped to be similar to reproduction of a human voice OR a perfect averaged voice sample may be provided. Harmonics should never be of the same amplitude as the fundamental. Tones need to be balanced. Without balance it will never be feasible. This may require tuning by ear.

    3. Modulation is required to simulate a glottal pulse and the vocal folds. I dont know if its 20Hz etc, but if we determined this we would be much closer.

    4. Comb filter (delay with reverb) can also simulate a fundamental with harmonics, but the harmonics would need to be toned down to simulate human voice. Overtones come into play.

    5. Impulse train (modulation) is critical and when combined with combfilter is very powerful

    6. Granulized sounds (short samples of sound) can be used reliably as long as a voice is shaped per number 2 above. Usage of the tool "Emissions Control" has proven this theory.

    7. EVPMaker set to 140-600ms with X-fade option only will also produce a randomized form of speech that produces a single voice. If pulsecomb (combfilter (delay + reverb) plus impulse train (modulation) is used in post-processing of randomized sound (live or recorded) this can be a very powerful tool

    8. Software enhancement. We have the ability now to enhance noise reduction and live filtering which in essence can allow sound to be heard that is imperceptible to human ears. Bias Soundsoap has improved greatly, as well as the introduction of Krisp artificial intelligence.

9. One idea that works is to provide voice is 2 tones spaced 2.5Hz apart. It is not known yet why this is. When tones 2.5 Hz apart are randomized a certain phenomenon happens. It's either a meditative state or conducive to speech patterns we are already trained to recognize. We don't know yet. to hear what that sounds like, listen to Stream 7 aka "Crystal trainer"

Please feel free to provide your creative interpretations of how we can simulate human voice. I am happy to demonstrate any of the principles outlined above for a better understanding. One of my shortcomings has been not taking the time to explain the ground already covered.
With your help I am confident we can crack this long-term goal.

🙂


   

Summary: We will have synthesized software "voiceboxes" that spirit can use. The fact is we already have this technology. Trial and error has shown that in order to hear a reproduction of a human voice an experimental sound needs to have some sort of randomization or be "energized" through a variety of methods. In the end though - it must mimic human voice characteristics.
In order to create a voice template for spirit we can refer back to human voice samples and provide a template that mimics normal human speech. This should be done visually as well as listening.

Different methods can be used such as: granulization, "energize" using a combination of combfilter (delay + reverb), or randomize using software such as EVPmaker. ALL of these efforts have shown the same principles to be true.

In short - do you have ideas for reproducing a human voice without intelligible signal information? I am happy to credit anyone that contributes to this work. My biggest challenge is that I have not taken the time to explain what I've learned so that it may be expanded upon by other creative people. And when I explain it in technical terms it falls short of what most people would understand or resonate with.

Feel free to add your comments and suggestions as to how to reproduce a human voice. I would find it helpful if you would consider the principles I placed above and let me know what questions you have regarding any of the items presented.

it is my hope that together, once and for all, we can crack this code. We already have all of the tools needed in our possession. We just need to work together. And I've held my ideas too close to my chest for too long.

What questions/suggestions do you have? I'm open to anything, as most of you know. Don't be shy 🙂

We have already succeeded partially with voice.....lets take it the rest of the way!

I'm happy to demonstrate any of the methods mentioned or presented to anyone that has interest to better explain the concepts.


Keith Clark

 

 

 

 

 

image.png

Link to comment
Share on other sites

Hi Keith,

This is a great idea for a project.

As you have outlined, it is not an easy task to create a human voice in its entireity, and requires a lot of processes to get the human-like voice nature created.

In my opinion, there are two main components required for any successful spirit voicebox.

You have outlined the skeleton of the first, and that is the voice spectral components required to form a resemblence of a human voice (in tone). A simple rough, robotic sounding voice is quite easy to manufacture, compared to the large amount of extra work to create something more natural sounding.

The other component is the interface that allows spirit to manipulate these spectral components and form words and sentences in synchronism with their intended messaage. This is the hidden but essential process that occurs inside technology, but without it, there is no intelligence embedded. In the case of ourselves, we do it with alterations of tongue position and mouth cavity shaping (as well as glottal volume on/off and partial volume modulations), so effectively performing complex bandpass filtering and amplitude modulations of the glottal frequency and its resting formants / harmonics. In the case of a spirit box, this function occurs in (what B.W coined the term) psychokinetic modulator elements, that spirit can interface with, and alter the stream of raw spectral energies flowing through, so effectively a similar bandpass filter / amplitude modulation action occurs to the raw stream, and therefore intelligence is embedded within it.

In my early work, I tried the path of creating spectral components to do this (this was pre-software times), but after much hard work, gave up this line of endeavor, and used a cheap voice synthesizer chip instead. The limitation here is the baseline voice quality is robotic in tone and cant be improved, but the pk modulation does add softness and accent that the default raw stream lacks.

I must add that the above does not probably answer your core question Keith, and that was to create a resemblence of a human voice without intelligence, but I thought my comments might be useful in expanding the discussion. I would say however, that single allophone generation is perhaps the closest to this (outputting a continuous vowel allophone for example), and this can be accomplished simply via a voice synthesizer that can output desired allophones on command. I know there are software solutions for this, but as a "hardware dinosaur" of many years, my expertise is of course in hardware solutions.

I must add that in all the history of the use of my devices, I have never received immediate direct (real time) responses to questions asked, but answers do come later. Perhaps the next day, or week.

 I look forward to the comments of others on this topic.

 

Link to comment
Share on other sites

  • Administrators

I value this discussion as a very good attempt to bring together all our results and lessons learned. Like you I am hoping for something like the uniform spirit voice theory.

For a long time I was just experimenting with noise in all its different flavors. i think I designed a dozen different deviced where noise was created and processed in all thinkable ways. In the end the results were all comparable. Noise seems to be the ideal stuff to wotk with because it is "fluid", very agile and basically contains all spectral components needed for the creation of human voices. The problem is that the pk-modulation in every device is so poor that all results were suffering from a very bad signal to noise ratio (SNR). The SNR was improved with pink noise but I had to pay the price of even more deteriorated spectral material. The voices were rough, croaky like grunting stoneage people.

It was some weeks ago as I remembered that Keith always worked with tones and harmonics. This led me to the idea to abandon the experiments with noise and concentrate more on the generation of sounds and tones instead of noise. The VISPRE was my first approach and now I am worrking with the SpiCa, a semi-chaotic audio circuit that converts voltage fluctuations into combinations of tones. It's a highly fragile and agile circuit that outputs digital signals.

I learned two things from my experiments and I can say that my results are aligned with Keith's theory. Firstly the spectral material that we should offer to the spirits should be made of tones and harmonics rather than noise. The nearer these tone combinations are to human voices the easier the spirits can recombine the spectral composition. Secondly, we need something to excite the tone circuit. Jeff calls this dynamism as far as I remember. basically it means that static tones left alone will not produce spirit voices, we always need to add what I call speech patterns which are impulse groups having the shape of a voice envelope function.

In my experiments with microphone recordings I could abserve that spirits are using everything I offer them as sound. When I was opening and closing drawers, they created voices rumbling with exactly that sound and rhythm. When i was typing on my keyboard they used the rhythm and sound to generate clicking voices. It always appeared to me they were able to reconfigure spectral content but never the rhythm. It appears to me they are constantly analyzing the rhythm of the sounds we create and aligning their desired voice content with it. In short terms, they have to see what they can do with that rhythm we provide because they cannot change it. What they can change is the spectral distribution of the original sound. How they are doing that I have no idea about.

The speech impulse patterns or impulse trains are clearly to be separated from the, lets call it "tone engine", the circuit that produces tones and harmonics.

Currently I am testing a new setup that contains two devices I designed previously. First there is the LINGER unit that generates speech patterns from the LED-light on a phototransistor with enhanced pk moduklation by use of the microphone processing circuit SSM 2167. These impulses are rectified, that means they are reduced to an enevelope function and become speech without any content, they are just rhythm.

This signal is fed into the SpiCa where the voltage fluctuations following the speech rhythm are causing the SpiCa to jump around between different tone combinations. The result sound like human speech but with a very small pool of vocals and consonants so far.

You can hear the results here: SpiCa

Basically I can say that Keith's theory is correct. What we still cannot achieve sufficiently is the combination of entropy and steering tones and harmonics.

Link to comment
Share on other sites

  • Administrators
2 hours ago, Andres Ramos said:

I value this discussion as a very good attempt to bring together all our results and lessons learned. Like you I am hoping for something like the uniform spirit voice theory.

For a long time I was just experimenting with noise in all its different flavors. i think I designed a dozen different deviced where noise was created and processed in all thinkable ways. In the end the results were all comparable. Noise seems to be the ideal stuff to wotk with because it is "fluid", very agile and basically contains all spectral components needed for the creation of human voices. The problem is that the pk-modulation in every device is so poor that all results were suffering from a very bad signal to noise ratio (SNR). The SNR was improved with pink noise but I had to pay the price of even more deteriorated spectral material. The voices were rough, croaky like grunting stoneage people.

It was some weeks ago as I remembered that Keith always worked with tones and harmonics. This led me to the idea to abandon the experiments with noise and concentrate more on the generation of sounds and tones instead of noise. The VISPRE was my first approach and now I am worrking with the SpiCa, a semi-chaotic audio circuit that converts voltage fluctuations into combinations of tones. It's a highly fragile and agile circuit that outputs digital signals.

I learned two things from my experiments and I can say that my results are aligned with Keith's theory. Firstly the spectral material that we should offer to the spirits should be made of tones and harmonics rather than noise. The nearer these tone combinations are to human voices the easier the spirits can recombine the spectral composition. Secondly, we need something to excite the tone circuit. Jeff calls this dynamism as far as I remember. basically it means that static tones left alone will not produce spirit voices, we always need to add what I call speech patterns which are impulse groups having the shape of a voice envelope function.

In my experiments with microphone recordings I could abserve that spirits are using everything I offer them as sound. When I was opening and closing drawers, they created voices rumbling with exactly that sound and rhythm. When i was typing on my keyboard they used the rhythm and sound to generate clicking voices. It always appeared to me they were able to reconfigure spectral content but never the rhythm. It appears to me they are constantly analyzing the rhythm of the sounds we create and aligning their desired voice content with it. In short terms, they have to see what they can do with that rhythm we provide because they cannot change it. What they can change is the spectral distribution of the original sound. How they are doing that I have no idea about.

The speech impulse patterns or impulse trains are clearly to be separated from the, lets call it "tone engine", the circuit that produces tones and harmonics.

Currently I am testing a new setup that contains two devices I designed previously. First there is the LINGER unit that generates speech patterns from the LED-light on a phototransistor with enhanced pk moduklation by use of the microphone processing circuit SSM 2167. These impulses are rectified, that means they are reduced to an enevelope function and become speech without any content, they are just rhythm.

This signal is fed into the SpiCa where the voltage fluctuations following the speech rhythm are causing the SpiCa to jump around between different tone combinations. The result sound like human speech but with a very small pool of vocals and consonants so far.

You can hear the results here: SpiCa

Basically I can say that Keith's theory is correct. What we still cannot achieve sufficiently is the combination of entropy and steering tones and harmonics.

On this part below - YES!!

Jeff calls this dynamism as far as I remember. basically it means that static tones left alone will not produce spirit voices, we always need to add what I call speech patterns which are impulse groups having the shape of a voice envelope function

Right, we have to pass it through a mutable/modulatable (made-up word) medium whether it be sound, light, radio waves, impulse, other tones OR randomization such as evpmaker, granulizer software.

Link to comment
Share on other sites

On 4/3/2022 at 6:36 AM, Keith J. Clark said:

Jeff calls this dynamism as far as I remember. basically it means that static tones left alone will not produce spirit voices, we always need to add what I call speech patterns which are impulse groups having the shape of a voice envelope function

Yes, Andres, good point.

Keith, I agree with Andres that yes, tones left alone will not produce spirit voices, but adding speech rate impulses does help them form. Also, in most instances (not all), the adding of traditional human voice instead - doesn't help (it is too slow in amplitude change to add much dynamism - but it does add spectral fuel). A transistor radio does not output spirit voices when an announcer is talking - but with the addition of a chunking  process / granulisation / or severe amplitude pulsing of that voice stream, then spirit transformation of the announcer's voice becomes possible. The post on the TRF radio of mine tried to make the point that to get spirit voices, the tuning needed to wobble, so short chunks of audio were created  from the continuous voice audio stream. The chunking in this instance brings into action the pk modulation process which transforms the voice. The only catch is that the pk transformative power drops off exponentially from the inception of each "kick". There are theories as to why this happens, which I can explain sometime.

To add another perspective to the principle - I'll outline an example of dynamism when doing traditional evp recording ... an operator does a mic recording inside his home. It is quiet and the window is open. There is a small amount of environmental noise. Suddenly a dog barks (once) down the street. This is recorded as a short but high amplitude sound impulse (compared to the ambient background). When reviewed, the recording showed that the dog's bark was transformed into a similarly tonal human word, and also that there was another word after the bark, but this was whispery - like the background noise. This shows the power of (sound) impulses in gating on pk modulation for a short time - upon the impulse source that created it, or the incidental audio that accompanies it. And as Andres has reported in his work, sharp rise-time impulses within circuitry also have power to generate voices upon the incident audio that accompanies these impulses.

Link to comment
Share on other sites

  • Administrators

Hi Andres, regarding this: my experiments with microphone recordings I could observe that spirits are using everything I offer them as sound

There's 2 parts to this. one is your energy field, the other is your brain's wiring and your interpretation. I agree with your experience as a whole because its the same for me.

 

 

Link to comment
Share on other sites

  • Administrators

agreed Jeff. I would add that in my work I have found the following to be true:

Modulation can also occur through randomization:

  • evpmaker (if suitable sound is input either from audio file or live)
  • Emission Control software (granulizer)

And like you said, for optimal conditions the environment should contain both of the following:

  • a medium through which the experiment is passed, ie light, sound, software, biofeedback, radio waves, so on and so forth
  • tones consistent with human speech as our brains are trained to recognize it

My work in this area started by trying to imitate Spiricom and eventually ended up being reverse engineering Spiricom to provide a synthetic voice template. I can say the success I've had is more than I expected at first.

To give people some context, below is a picture of the infamous Spiricom "Mary Had A Little Lamb" recording. Note, whether an electrolarynx was used or not is irrelevant, the fact is it was understandable as human speech. So it's valuable because it's not just speech, its also enhanced\energized\excited speech via the technique. Note the modulation pictured below:

 

image.png

 

The closer I get/more I understand to real-time synthetic voice the more it seems to imitate Spiricom principles - at least as they were outlined.

I have used a variety of techniques, some of which I've held close to my chest. They were all successful in their own ways.
Here they are in their variety:

  • noisegen (white noise) with live noise reduction
  • noisegen with live noise reduction and Krisp A.I.
  • tone generator through evpmaker
  • tone generator through combfilter and live noise reduction
  • sinewave through Emission Control granulizer followed by live noise reduction
  • white noise through pulsecomb effect in audiomulch
  • tones through pulsecomb effect in audiomulch
  • tone generator through pulsecomb followed by combfilter
  • and many many more, including post-processing and "shaping" of the harmonics to be commensurate with human voice

 

I'll now share results of current work - though it may not seem impressive I'm pretty close to solving it. Which is why I ask for help.

Krisp post-processing

 

Without Krisp below:

How it appears visually:

image.png

 

Granted, the way I explain things tends to be in my own vernacular. When you guys get super technical I grasp the concept but lack the implementation. However I also know due to experience when I'm super "close"

 

Jeff, you're a super smart guy, maybe you can help. Right now I'm using 200Hz sine modulated (mixed with) 1Hz sine. This is then passed through evpmaker for the random factor. From there, one of the following:

  • pulsecomb followed by shaping (and with or without Krisp A.I.)
  • no pulsecomb with live vst combfilter (with or without Krisp A.I.)

So now I only have one question:

How can I get the bandwidth modulation of each harmonic to sway (more bw per harmonic) in harmony? To look more like the Spiricom photo above rather than the second picture above. I've used the wobble technique...its not optimal for this type of application, or at least it hasnt increased intelligibility for me (more like added vibrato).

Thanks!

 

 

Link to comment
Share on other sites

4 hours ago, Keith J. Clark said:

How can I get the bandwidth modulation of each harmonic to sway (more bw per harmonic) in harmony? To look more like the Spiricom photo above rather than the second picture above.

Hi Keith,

Im not sure of exactly what you are asking Im sorry, but I presume it is to obtain the varying spectral amplitude modulation distribution that is evident in the spiricom spectal image? Well, to imitate more closely the action of synthetic speech, what would be required is the use of what Id call a random acting (frequency modulated) narrow band notch (reject) filter / spot freq attenuating equaliser that randomly sweeps up and down the frequency range of the total span and attenuates by quickly varying amounts as it is doing so. This would give the harmonics individual and unique degress of amplitude modulation (each) - an action that naturally happens in forming varying levels of formants from fundemental glottal sound and its harmonics. It must be said, that the action of this proposed added tool - is already present in the pk action that occurs to an incident source, so I am thinking that you want to try and synthesize the action of spirit in this request? If so, it is possible you might make it easier for spirit to add their (then smaller) modulation tweaks (the good news), or inhibit their action by screwing up formants that they would want to modulate to their specified degree of level (the bad news). This is a 2 edged sword I think, in going down this path.

If Ive missed the mark in understanding, please can you re-phrase the question?

Link to comment
Share on other sites

  • Administrators

I asked the same question in both platforms. Sorry about that.

Here is Andres' reply:

Theoretically the bandwidth of an impulse becomes wider if the impulse gets shorter. In my engineering's study we were taught the Dirac impulse. This is an abstract assumption of an impulse with the width zero and an infinite high amplitude. It gives an infinite bandwidth with constant level.

 

So practically you could try to use impulse trains with a lower duty cycle. In a square wave signal you have 50% ON and 50% off. Try to make it 20% ON and 80% OFF or something like that.

Link to comment
Share on other sites

Thanks for posting Andres' RC comments, Keith.

Yes, I agree with Andres that the variation of pulse width will in turn vary the amount of harmonics produced, but this is of course a broad spectrum effect, where the pulse's harmonics become varied in a synchonous fashion. This would indeed alter the 'look' of the spectrogram, but alter is as though all harmonics were varying in 'brightness', and not give the result I think you are asking for - which I presume is a randomised, patchy and localised modulation of spectral harmonic groups, as per what is seen in the spiricom screenshot - and which synthesises the effect of vocalisation / formant modulation by the human voice cavity components.

I believe it would be possible to achieve the effect I describe, and I imagine that a Python script could be written to do it.

Also, there is no need to apologise. It is inevitable that technical discussions have / will continue to occur on RC that are not seen - or able to be commented on by forum members, such as myself.

Link to comment
Share on other sites

  • Administrators

Hm, I just stumbled upon an impulse wave simulator. 

https://connecthostproject.com/spectre_pulse_en.html

I played with a 1 khz signal and pulse widths between 10 and 100 us and 10 calculated harmonics. From what I saw the spectrum shape definitely changes and shows a sinusoidal amplitude distribution depending on the pulse width.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.