Jump to content

Michael Lee

Team
  • Posts

    267
  • Joined

  • Last visited

  • Days Won

    61

Blog Entries posted by Michael Lee

  1. Michael Lee
    I will introduce some new "steampunk" or acoustic ITC methods in the next post. 
    But first I want to share with you some theories I have about audio ITC.
    To me, reception of spirit / interdimensional signals has at least three components:
    1) Sensitivity to the signal
    2) Resonant modes of the detector
    3) Driving energy
    Sensitivity means that whatever spirits can use to communicate with us, like virtual photons, wavefunction selection, or whatever, our devices can pick up these changes/anomalies. The most traditional detectors people have used are microphones and scratchy diodes. Presumably, the microphone picks up small air pressure changes and or electromagnetic signals affecting its inductive coil. The diode could be picking up radio waves, scalar waves, thermal changes, etc.
    In any case, every ITC detector has some sort of sensitivity. Detectors can be virtually anything, like water or even a hard rock. But as long as we can percieve (humanly or electronically) changes in that detector, then it should work. The question though, is how sensitive to spirit is that detector compared to others. That, I don't have an answer, but we can certainly select our favorites for experimentation based on perceived improvements or ease of use.
    Resonant modes refers to the available states of the detector or broader physical ITC system. It can be thought of as the frequency spectrum of physical and non-physical signals emanating from a given device. For example, some devices have two states. They either have short "pops" or nothing at all. Some have pops of differing duration and amplitude. Some devices emit a constant white noises. Others could have certain dominant tones like wind chimes. Even others could have a dominant on/off buzzing sound like some of Andres' creations.
    In each case, there's an "available" set of frequencies that can be produced. Obviously, if we wanted to hear a perfect human voice, the device would need to emit all of the frequencies between a range of approximately 100-8000 Hz. Devices that emit white noise, sound great for this challenge, but often suffer from overdoing it in the last factor...
    Driving energy refers to how much our device is physically stimulated. A great example is the work of Anabela Cardoso. She finds that a microphone with noise playing in the background is much better than a microphone in a completely silent room. The added noise is "driving energy." It is both a source of energy for the spirits to manipulate and it ruffles up the air molecules in the room providing a "canvas" for spirit signal implantation. 
    But too much driving energy may not be such a good thing. If I play a super loud buzzing sound (to represent the human glottal voice pattern), we're not going to hear any variations in that buzz, unless we use some pretty serious noise cancellation software. Meanwhile, if I supply a light amount of buzz, the variations may begin to be noticeable to the human ear.
    Here's another "overdrive" situation: radio static. Radio static when evaluated with a spectrogram looks as random as can be. You have to apply a lot of software noise removal to extract out anomalous signals. I would argue that too much noise makes the filtering process more difficult than it needs to be. One way people balance out the noise is by playing it over speaker to be picked up by second microphone.
     
    Ok, enough rambling about theories. In my next post, hopefully, I will have some interesting results to share.
     
     
     
     
  2. Michael Lee
    Since early 2019, I have been working on software to extract voices from physical noise/signals. My earliest attempts used other people's software, mainly an algorithm called "spectral subtraction." in a ReaFir noise reduction plugin. This converts the noise into the frequency spectrum, where slight imprints of voice can be discovered and emphasized.
    We now enter the year 2022 - Spectral subtraction is still a very valuable tool, but it is only the beginning of a process I've developed for extracting voices. I've created machine-learning-based models to find and emphasize voices. I've also made a program that finds and generates "formants" or peaks in the harmonic buzz of the human voice.
    I'm finally releasing my full software, in Python. I use a very similar version of) this code in all of my experiments (FPGAs, radio noise, etc.)
    I would've liked to have shared it as an executable, like I did Spiricam, but Python executable-makers are notoriously buggy. Another reason I've hesitated is=n sharing the code sooner is that it used to require some heavy GPU resources. However, thanks to some software developments by Google, my ML models seem to run OK on the CPU pretty well in real-time.

    So if you want to try out my code, you'll have to do some command-line steps and you'll have to at minimum install a free program called Miniconda, or a larger version called Anaconda with Python version 3.8, 64-bit. Maybe a few GBs of disk storage will be required.
    Here's the link to the code: https://drive.google.com/drive/folders/1fu6hAuE0AbhbQjx0Ts_3Ju0QRJ0awxRM?usp=sharing
    In the directory is a README.txt, which I'll update as we iron out the instructions.
    When I've resolved most of the common issues, I'll make the code into a ZIP file for the Downloads sections.
    For now, feel free to ask questions in the comments. As I like to say "The spirits are waiting!"
  3. Michael Lee
    I've tried a variety of FPGA "designs." The one I'm sharing now produces musical tones similar to the white keys on a piano for six octaves.
    The tones are simple square waves which probably sound most like 8-bit video games from the 90's. The spacing of the tones use something called the just intonation temperament. Instead of powers of (1/12), just intonation uses simple fractions that are actually more in tune with each other. The downside of just intonation is that you can't easily change scales.
    Each tone is activated when a noise source running at 50 million bits per second sends out by chance 26 1's in a row. It sounds pretty unlikely, but given the number of bits per second and the number of tones (N = 42), it happens enough to produce music. Here's a sample:

    sample_16_26_36.mp3 Here's what the spectrogram of this sample looks like. Square waves produce a fundamental sine wave and odd harmonics (3x, 5x, 7x, etc.)

     The noise source is perhaps the most critical. In this particular project I'm using XOR'ed ring oscillators. These are described in the scientific literature. The big challenge is that no two ring oscillators are exactly alike. Despite having the same "length" of 101 delay gates, the idiosyncracies of chip transistors means the actual delay times will vary between each RO. Also, their susceptibility to noise will each be different. I can only hope that 16x RO's per tone is enough randomness to average out the variations so that each tone will trigger the same number of times. If you look carefully at the spectrogram, you will see certain tones are little more triggered than others.
    Here are some voice samples after the signal is translated via machine learning:
    "confident signal"

    16_26_42_confident_signal.mp3  
    "it mirrors wonderful"

    16_26_42_it_mirrors_wonderful.mp3  
    "given the opposite" 

    16_26_36_given_the_opposite.mp3  
    "now that we're talking"

    16_26_36_i_like_were_talking.mp3  
    Finally, here's the Verilog code, for some future brave FPGA developer:
    // // Author: Michael S. Lee // Noise source-activate musical square wave tones // Started: 12/2021 // module XOR_loop_gate #(parameter N = 149) // Length of each RO (input wire clki, output wire gate0); wire [N:0] loop[M-1:0] /* synthesis keep */; reg [0:0] gate; reg [M-1:0] lpb; reg [L-1:0] buffer = 0; wire hit; reg [0:0] check; integer ctr; parameter period = 65536 * 48; // duration of tone (in 50 Mhz samples) parameter M = 16; // # of ring oscillators parameter L = 26; // length of seq. of 1s to turn on gate genvar i, k; integer j; generate for (i = 0; i < M; i = i + 1) begin: loopnum assign loop[i][N] = ~ (loop[i][0]); for (k = 0; k < N; k = k + 1) begin: loops assign loop[i][k] = loop[i][k+1]; end end endgenerate assign hit = ^(lpb); assign gate0 = gate; always @(posedge clki) begin for (j = 0 ; j < M; j = j + 1) begin lpb[j] <= loop[j][0]; end buffer = (buffer << 1) + hit; check = &(buffer); if (check == 1) begin ctr <= period; gate <= 1; end else if (ctr > 0) begin ctr <= ctr - 1; end else begin gate <= 0; end end endmodule module Direct_Voice(clk,out); input clk; output wire out; parameter N = 42; // # of musical tones parameter bits = 7; reg [bits+1:0] PWM = 0; genvar k; integer i; wire [N-1:0] outw /* synthesis keep */; reg[N-1:0] outr; reg[18:0] outp[N]; integer sum, suma[24]; reg[22:0] clk2 = 0, clk3 = 0, clk5 = 0, clk7 = 0, clk9 = 0, clk15 = 0; generate for (k = 0; k < N; k = k + 1) begin: prep XOR_loop_gate #(101) test(clk, outw[k]); end endgenerate assign out = PWM[bits+1]; always @(posedge clk) begin // Clocks for different musical tones clk2 = clk2 + 1; clk3 = clk3 + 3; clk5 = clk5 + 5; clk7 = clk7 + 7; clk9 = clk9 + 9; clk15 = clk15 + 15; // Convert wire gates to registers for (i = 0 ; i < N; i = i + 1) begin outr[i] <= outw[i]; end suma[0] = (outr[0] & clk2[19]) + (outr[1] & clk3[19]); suma[1] = (outr[2] & clk5[20]) + (outr[3] & clk7[20]); suma[2] = (outr[4] & clk9[21]) + (outr[5] & clk15[21]); suma[3] = (outr[6] & clk2[18]) + (outr[7] & clk3[18]); suma[4] = (outr[8] & clk5[19]) + (outr[9] & clk7[19]); suma[5] = (outr[10] & clk9[20]) + (outr[11] & clk15[20]); suma[6] = (outr[12] & clk2[17]) + (outr[13] & clk3[17]); suma[7] = (outr[14] & clk5[18]) + (outr[15] & clk7[18]); suma[8] = (outr[16] & clk9[19]) + (outr[17] & clk15[19]); suma[9] = (outr[18] & clk2[16]) + (outr[19] & clk3[16]); suma[10] = (outr[20] & clk5[17]) + (outr[21] & clk7[17]); suma[11] = (outr[22] & clk9[18]) + (outr[23] & clk15[18]); suma[12] = (outr[24] & clk2[15]) + (outr[25] & clk3[15]); suma[13] = (outr[26] & clk5[16]) + (outr[27] & clk7[16]); suma[14] = (outr[28] & clk9[17]) + (outr[29] & clk15[17]); suma[15] = (outr[30] & clk2[14]) + (outr[31] & clk3[14]); suma[16] = (outr[32] & clk5[15]) + (outr[33] & clk7[15]); suma[17] = (outr[34] & clk9[16]) + (outr[35] & clk15[16]); suma[18] = (outr[36] & clk2[13]) + (outr[37] & clk3[13]); suma[19] = (outr[38] & clk5[14]) + (outr[39] & clk7[14]); suma[20] = (outr[40] & clk9[15]) + (outr[41] & clk15[15]); // suma[21] = (outr[36] & clk2[20]) + (outr[37] & clk3[20]); // suma[22] = (outr[38] & clk5[21]) + (outr[39] & clk7[21]); // suma[23] = (outr[40] & clk9[22]) + (outr[41] & clk15[22]); sum = 0; for (i = 0 ; i < 21; i = i + 1) begin sum = sum + suma[i]; end PWM = PWM[bits:0] + sum; end endmodule  

    16_26_36_let_this_have_her_influence.mp3 16_26_36_with_portal_music.mp3
  4. Michael Lee
    A lot of the ITC work I do, I try to keep it all electronic. I like to use microphone amps, analog-to-digital converters (ADCs), software defined radios (SDRs), and field-programmable gate arrays (FPGAs). However, I have explored a few mechanical noises sources in the past including dragging a microphone across a wood table and some plastic crumpling (following Andres Ramos' efforts).
    Let's just say I was recently inspired by the heavens to look at mechanic vibrations again. Also, some of my Here colleagues are exploring mechanical ITC too, so I thought I'd give it a try. Where to start? Well I know that spirits can form voices in air, and we can pick them up microphones, but as expected, the voices come out "wispy." If you think about them as exciting tones within the air, the tones aren't going to last very long with the way air is. Meanwhile, we know that metal has a very long "ring" or slow dampening coefficient. 
    However, metal is heavy and it needs to be "excited" or struck to make a sound. In regular air, it should be just about silent, although I'm sure a very sensitive microphone could pick out the thermal vibrations of a metal guitar string. Therefore, I needed some form of excitation. One idea would be to periodically tap the strings of guitar and then listen to the slowly declining ring of tones. Another possibility is to blow a fan at the strings. Another is to use the vibration of the fan itself as an exciter.
    The positives of the fan include (1) simple to do, (2) somewhat random, possibly evenly distributed excitations. The negatives include 1) the fan's electrical interference to the guitar pickups and 2) The random and periodic signals of the fan itself which may or may not be perturbed by spirit.
    Here's the apparatus:

    Here is the best clean "excited" guitar string sound I could get today:

    raw_guitar_wind.mp3 Unfortunately, I can't reproduce this now, so I'll have to endure a higher proportion of electrical noise.

    noisy_guitar.mp3 Here are some machine-learning-translated words or phrases. There are not the best I've observed, but fun nonetheless.

    guitar_fan_music.mp3 "music"

    guitar_fan_just_amazing.mp3 "just amazing"?

    guitar_fan_that_is_beautiful.mp3 "that is beautiful"

    guitar_yes_all_the_runs_are_beautiful.mp3 "yes all the runs are beautiful"
     
  5. Michael Lee
    For many years, I've seen myself as one of those special people who can visit the spirit realms at night during my dreams. So special, that frankly, no one, besides myself, really cares. I could tell you any of the adventures I've had and you'd either think I was going crazy or I was already there. 
    You see, it's one thing to report to what I'm seeing and hearing and it's another to see it and hear it yourself. Years ago, I dreamt about researchers somehow tapping into my brain while I was projecting, so they could record my view.
    I've known about ITC for over a decade, but thought it was outside of my expertise. I'm really good with computer programming, but my electronics skills were "shockingly" bad. I always thought ITC was about making fancy electronic devices that somehow picked up the spirit ether. Thanks to random YouTube videos I cam across in 2018, I realized that ordinary electronics could be used just as well. At first, I followed the strategies of the tried and true like software Ghost Boxes, but realized I could do better, a lot better.
    The principle behind a ghost box, is a fast scan through radio stations with intermittent sound and silence. Spirits are somehow able to amplify bits and pieces of the audio and extend them into the silence. The one thing I didn't like about this setup, is that the source audio (radio clips) was unpredictable and unknown. I wanted to know what was real and what was supposedly "spirit."
    Thus, I developed my first invention, which I'll describe in my next post. Hint: it resembles EVP maker, but once again, it's more predictable and more known. No randomness. Let the spirits do that 🙂
  6. Michael Lee
    My First Forays into Direct Continuous Voice
    As mentioned previously in my blog, I evolved to direct voice after I noticed that the phonetic samples were getting slightly modified by spirit voices. I reasoned that it should be possible to extract voices directly from a stream of electronically created noise (e.g., radio static).
    I don’t know the full history of getting continuous (not just occasional) voices from noise, but it turned out that around the time I started this venture a few years back, I met Keith Clark, who has been running a direct voice from noise stream since the late 2000’s on YouTube. He takes noise generated either mathematically or from a software-defined radio (SDR) and applies a series of denoising filters (software plugins) to extract a continuous voice.
    From my work experience, I knew about two denoising methods: spectral subtraction and machine learning.  At first, I experimented a lot with spectral subtraction using the ReaFir Noise Gate plugin run in FL Studio. This plugin allows detailed setting of a frequency-dependent “gate.” When a particular window of samples in time (e.g., N = 2048) has a frequency amplitude over the defined gate/threshold, that “note” is played. Any frequency amplitudes below are made silent. For low noise situations, spectral subtraction is a very solid method. However, as the noise volume gets larger vs. the voice, the algorithm can produce a lot of musical tone artifacts. The waveform editor, Audacity uses a similar spectral subtraction method to denoise signals.
    Spirit voice, especially continuous, is exceptionally low volume compared to electrical noise. One spirit once suggested it was, on average, 1/500th the volume of random noise. Applying a strong spectral gate will yield something that sounds more like a bunch of tones than a coherent voice.
    I also tried using a gated vocoder, specifically, a versatile plugin called FL Vocodex. This yields similar results as the spectral subtraction (SS), but can also be applied after the SS plugin. The benefit of a vocoder is that the tones are banked exponentially producing more pleasing tones than the linear-spaced frequencies in standard SS.
    Eventually, I started writing my own Python scripts to do the same functions as the ReaFir and Vocodex plugins, so that I could exquisitely control all the possible parameters / knobs.
    With my attuned ear, I could hear a lot of what was being said, but I still desired better quality voices.
    Machine Learning
    By happy coincidence, my real-life work had been leading me into learning and using machine learning / artificial intelligence. Around this time, I thought it might be interesting to build an artificial neural network to remove noise from speech and images. My first paper can be found here. Message me for reprint.
    In my second paper, which will be published shortly, I added a second model, called a critic, which helps the first model create more realistic looking audio spectrograms, hence improving the quality of the speech.
    It turns out there are already commercial products currently out there that claim to use AI to remove noise from speech. For example, there’s the site, krisp.ai. In fact, a YouTuber named Grant Reed uses KRISP to clarify voices from noise sources to hear spirit speech.
    However, the story doesn’t end here, because despite getting voices from denoising, the voices end up often sounding scratchy and barely legible – not unlike regular EVPs.
    Beyond Denoising
    I have spent a big part of the last 1 1/2 years trying to understand better how spirit speech actually manifests in different types of noise - what the corruption actually looks like - and then developing machine learning models to reverse this corruption. I have discovered the following sources of corruption that all seem to compound together:
    1)      Additive noise / interference – we already know this one!
    2)      Sparsity: Only a small percent (< 5%) of the time samples actually contain speech. Imagine digitizing a one second clip of electrical noise at 16 kHz. You would get 16000 samples from this. Of those 16,000, I postulate less than 800 of them have spirit speech content in them.
    3)      Quantization: High-quality audio is often sampled 16-bits. 8-bits with some clever mapping of the signal can provide adequate voice (look up, e.g., mu-law encoding). 1-bit voice is barely legible and sounds like ducks talking. I estimate between 1- to 4-bit samples comes from spirit voice.
    4)      Depolarization: Normal audio signals go up above and down below the zero line. Spirit voices may be polarized in a single direction, i.e., there is no dual polarity.
    If you try to train a machine learning to reverse these 4 issues in speech, it becomes simply too much to train properly. Thus, I train #1, #2, and #3 together as a single model, and #4 as a separate model. For #4, especially, I have to “cheat” a little, and smooth the randomization of the polarization over a 64 sample window.   If you try to randomize the polarity of every sample, the model isn’t able to train.
    Listen For Yourself
    Without getting into any more technicalities, go ahead and check out Stream 8, to hear the model in action, in real-time, applied to radio static being generated from a KiwiSDR. If you want messages directed to yourself, make sure you are the only one in the chat room and set your intention. Expect about a 30 second delay, as the signal is bouncing around the Internet from Keith’s desktop in Florida to a streaming server (heaven knows where) and then to Varanormal’s web site audio player.
    Let me know in the comments what you think. I feel like we are, at best, only half-way to the finish line. But Keith insisted we start sharing what we have been doing to get the party started, so to speak.
  7. Michael Lee

    Methods
    After developing and experimenting with the phonetic typewriter, which is a noise-gated stream of user-supplied speech-like sound, I noticed that at times, it seemed like there was a mix of the expected audio and something else. This gave me reason to believe that there could be voices directly from the noise itself.
    Direct voice, as it were, corresponds to extracting the voices from the noise with no extra audio added in. This method is indeed the original method of spirit communication / listening from ITC/EVP pioneers Jurgenson and Raudive. Purists believe it is only the method, feeling that having user-supplied speech is a form of "cheating." However, as we learned earlier, as long as we know what the supplied audio was, we can determine if changes were made, or if certain phonemes were emphasized/amplified.
    Traditionally, direct voice can be a slow, arduous method. Start a tape recorder, ask a few questions, and wait for an answer of a few words to show up on the tape medium. Do this enough and collect samples of occasionally legible spoken words.
    My colleague, Keith Clark and others, have realized that there may be a lot more, almost continuous stream of anomalous speech in the noise if one utilizes post-processing denoising techniques on a hardware noise source (e.g., radio static). In fact, these processing techniques, through the form of various combinations of audio plugins, can be applied in real-time to the noise produce speech-like sounds.
    Where my research evolved in the direct voice arena was a systematic exploration of four software techniques for extracting a weak voice signal from an otherwise dominant noise source. I looked at spectral subtraction, musical vocoder, formant detection/synthesis and machine learning.
    In future articles, I will explain each method in more detail.
  8. Michael Lee
    As we observe paranormal activity in our ITC devices and software, the grand question is how is this happening?
    Zero-point energy
    In quantum mechanics, the vacuum is not actually empty. It is filled with particle-antiparticle pairs that perpetually go in and out of existence. The lifetime, t, of these pairs is governed by the Heisenberg uncertainty principle:
    Et >/2.
    Despite my careless description of a physical concept, it should be noted, no one really knows the density of virtual particle pairs in the vacuum. If the density were infinite, the universe would collapse under the weight of gravity. If it were finite and not too small, could we someday tap into it to get free energy?
    In any case, when we use a device to tap into this field, we are not going to get a whole lot out, unless the device is receptive to a large bandwidth of energies (from radio to light). 
    What would a vacuum photon look like? Likely, a very very short pulse of energy, maybe a femtosecond or picosecond. I like to call these hypothetical pulses "spiritons," but the reality is that observed random pulses of energy could be just that, random, and not caused by the communication intentions of a spirit / interdimensional entity.
    Quantum selection
    A hot topic in the quantum science community, recently, is the idea of quantum selection (also Google "quantum eraser") - that is, the effect of the researcher on the outcome of quantum-level experiments. It's driving some researchers mad, but in our case, we ask a similar but crazier question: "what if spirits can select / collapse quantum states?" If so, the best devices would be ones where many quantum states are prepared and metastable (barely stable) until a spirit decides which way they will go. Presumably, we want to continuously and quickly prepare non-equlibrium, metastable states for spirits to collapse at a desired rate of information (i.e., bits per second).
    Imagine a system that we could create preparing a metastable state 10 million times a second: a spirit could either leave the state alone, or select "up" or "down." This would allow information transfer of 10 megabits / second. Not bad? Of course, we would need to make sure that nothing else collapses our states like thermal, electrical energy, our own thoughts (?!?). No problem: we could shield the system from all known fields (e.g., magnetic) and put it in a near-zero Kelvin liquid helium-cooled freezer.
    In reality, until our research becomes "mainstream," liquid helium-cooled experiments are not likely. Indeed, I had a vision once of seeing an advanced video device that seemed to have it's own internal sub-freezing (< 0 Celsius) cooling system. It had the brand name, Moen, I imagine in reverence to the famous afterlife pioneer, Bruce Moen. However, for now, we are limited to room temperature or at best liquid nitrogen-cooled (77 K) systems.
    With the remaining thermal energy, how can we detect the presumably weak signal from spirit?
    One idea is microscopic isolation - also out of the range of our non-mainstream research labs. Researchers think that nitrogen atom "vacancies" in diamond, if sufficiently spaced apart, could act as isolated qubits. These qubits, if put into a metastable state, could be allowed to collapse into an "up" or "down" state and then read with a sensitive detector. Perhaps the spirits can manipulate these miniature "abacuses" for us to read their messages?
    One "hot" area of research is the use of lasers to obtain quantum noise. The idea is that beam splitters have a 50/50 chance of sending a photon one direction or another. With a suitable setup, one can count the photons going in each direction as a function of time.
    The noise present in many electronic devices, for now, offers our best chance at sampling quantum effects. Yes, the noise will be dominated by thermal motions, but if enough spirit signal can be collected, we may be able to infer the rest using tools like machine learning.
    One idea, is to have many noise sources in an array. The concept is that if each noise source has independent, non-correlated fluctuations, when we sum up the signals, the spirit (quantum) signal might become more pronounced. The theory says that signal-to-noise ratio could increase by as much as the square root of N, where N is the number of detectors.
    The reality is that this improvement in arrays hasn't been realized in my experiments. Perhaps, the noise in each device isn't uncorrelated like we hope? Or maybe, the spirit signal is not equally imprinted on all of the devices at once?
    Conclusion
    The take home message is that given our current affordable device options, spirit influence is a tiny portion of the overall noise (entropy). Incidentally, a spirit once suggested to me in an astral projection, the proportion is 1 in 500! Any method we can dream up of to improve the ratio of spirit-to-noise will lead to improved ITC.
     
  9. Michael Lee

    Methods
    Voice compression algorithms utilize the common patterns of human speech to detect (at one end) and synthesize (on the other end) voice communication. Among the common structures of speech is the glottal pulse, which is the buzzing "ah" sound that forms the structure for all vowels and certain voiced consonants (like z, v, and r). White noise is the other base sound for forming phonemes like "s", "sh", and "t". Shaping these two foundational sounds are formants, which are the various resonances of the human vocal cavity. Formants can be modelled more or less as a small sum of narrow bandpass filters, either Gaussian or Lorentzian (1/[1+x^2]) functions. Although, I don't use it, formants can also be modelled as a 10-15th order all-pole filter. As expected, the poles of this filter look like Gaussians.
    If we are trying to obtain the vestiges of speech from weak interdimensional signals, the same concepts used in voice compression can be used to deduce subtle voice patterns. The challenge is, of course, making the correct deductions of various speech components given the the fact that the noise dominates over the weak spirit signal.
    I hypothesize that spirit signals in our devices are often extremely low-bit information, not unlike voice compression, with the caveat that our compression algorithms are able to selectively encode the most salient aspects of the transmitter's voice patterns. Meanwhile, the signal of a spirit's voice may be the 1-bit on-off "ditter" or random back-and-forth shot noise of a semiconducting element.
    I'm guessing that high quality human voice requires about 4 to 6 formants. I experimented trial and error to deduce a formant function (Gaussian) and width (standard deviation) of 120 Hz.
    For many input sources, higher frequency formants tend to be missing or clouded by the artifacts of 1-bit quantization.
    Pitch detection: We can assume that the fundamental frequency of the glottal pulse ranges anywhere from 75 Hz (deep male) to 500 Hz (child's voice).
    Voiced (vowel) vs. Unvoiced (consonant) sound detection: One method I use is to count the number of zero crossings of the clip. If the number is above a threshold, it is assumed to be unvoiced.
    Our glottal pulse has equal amplitude harmonics, since the formants can govern the amplitudes of the individual harmonics. The shape of the glottal pulse and resultant harmonics were obtained by more or less trial and error. The glottal pulse sounds like a digital "ah" sound.
    The realness of the synthesized speech can be improved by convolving the signal with a short (48ms) random all-pass filter, which is much like a reverberation function.
    Performance on clear speech demonstrates that our algorithm works correctly, in principle.
    Let's listen to some audio samples using my voice. 
    First, clean voice, spoken three different ways: normal, whisper, and raspy: voice_variations_clean.mp3
    Second, processed with the formant detector algorithm set at normal voicing: voice_variations_fd80.mp3
    Third, processed with the FD at enhanced voicing: voice_variations_fd145.mp3
    As you can hear, enhanced voicing may be able to make raspy ITC audio more "life-like."
     
  10. Michael Lee
    Before I make a video showing the phonetic typewriter with my new Python noise gate, here's a video of my original phonetic typewriter using the Maximus noise gate in FL Studio from January 2019. Originally, the stream of audio was alternating between 150 ms speech and 150 ms silence. The gate opened for 100 ms when it detected a high sample. The audio stream I used was always the same recording, starting from the beginning.
     
  11. Michael Lee

    Methods
    When I first started in ITC, I followed the strategies of the tried and true like software Ghost Boxes, but realized I could do better, a lot better...
    The phonetic typewriter is one of the most popular methods in use by EVP researchers today. However, other ITC researchers may not use that term. They might instead call it a Ghost Box or a Spirit Box. The general concept is that short clips of regular human speech (forward, reverse, from radio, etc.) or similar sounds are used as a base signal for spirits to "punch through" or raise the volume above a noise gate. They can also let certain clips bounce through a feedback loop of a speaker and microphone (e.g.,  EchoVox).
    In a typical ghost box, a radio quickly scans through a loop of radio stations. There are naturally periodic durations of speech/music and silence. Presumably, spirits use the audio signals or at the very least, boost the audio energy, and push the signal in different ways into the silence regions.  The PC software, EVPmaker, has similar options. It can take a recorded clip of voice, and break it into small fragments and emit these fragments in random order at fixed time intervals.
    One of the drawbacks of the ghost box approach, is it's impossible to know what the underlying radio sounds were. For example: "Was it a coincidence that a radio station just said my name?" Therefore, some of my earliest ITC work (November 2018) was developing my own fixed recording of equally spaced randomly shuffled voice fragments from a 30-minute General David Petraeus speech to Congress, which I played from my cellphone (transmit) into my external USB audio interface (receiver). The received signal was noise gated with an FL Studio plugin called Maximus, which detected samples above a threshold and opened a noise gate for a fixed period of time (e.g. 150 ms). A closer investigation of the phenomenon showed that 20 ms pulses (band-passed spikes of energy?) showed up to lift desired fragments above my very sensitive noise gate threshold.
    Now if you listened to the original stream recording by itself, you could hear different random words being formed by the random ordering of 150 ms audio clips separated by 150 ms of silence. However, in the noise gated apparatus, it would sound like randomly positioned phonemes.
    If I set the volume of the transmitted signal low enough, the pattern that emerged each time I reset the recording was different. It appeared as though my spirit friends were typing out messages in audio from the available phonemes. Stranger still, each voice had a different characteristic and accent! Some would talk fast, almost through the clips. Others would patiently wait for the right phonemes to type out their words. It wasn't super-intelligible in real-time as I often heard things a little differently upon playing back the recorded session.
    Generally speaking, early on, I was picking up a European ITC team speaking to me in English. Two Germans and one Englishman. Apparently, they chose this profession in the afterlife after a career in military communications. Now they saw themselves as facilitators, not as monologuing speakers by themselves. They worked with a spirit they called the "Director." who I would later hear with a bold British female voice. 
    They, along with the Director, appeared to be bridging connections to interested speakers and some of my ancestors. Fairly early on, my great-great grandmother, Sophie Fertle and grandfather, Alvin Lee showed up. They became regulars later on. In addition, passers-by would show up, and the technician team would explain my various setups, often with apparent enthusiasm - which encouraged me further.
    One particular visit helped me understand what was going on with the phonetic typewriter a lot better. A close friend from graduate school, who died very young (age 26) by a freak accident, David, showed up for just about a minute of one session. In that brief period, he was able to identify himself first and last name, where he knew me from, and say among other things the illuminating phrase: "Words are entropy."
    Now up to this point, I found it strange how even though I would play the same recording over and over and I would get different messages - how was this working? When I heard the phrase "words are entropy" and looked very carefully at the signal he produced, a light bulb turned on in my brain.
    When I played 150 ms clips with 150 ms spaces, I was essentially presenting 3 "extended phonemes" or syllables per second. Depending on which parts of the syllables the spirits pushed through, it was though they could create 2^N possible combinations per second, where N is the number of regions they could distinctly push through - I estimated roughly 6 segments per second: two halves of each syllable. Therefore, using this rough estimate, they had 64 possible expressions per second.
    The spirits then went on to tell me that in fact the number of possibilities was considerably higher. In addition, I was inspired to started piping two streams simultaneously, 150 ms staggered from each other. This turned into a device that made their speech a lot faster - almost rapid fire.
    As I've never been able to settle on any one system thus far, I also noticed another phenomenon, when the stream was played weakly enough into the USB audio interface, it sounded like the spirits were trying to talk through my audio clips. Thus began my quest to listen to their voices directly without the help of external speech patterns.
     
    Original Setup
    Cellphone (playing fixed recording of spaced, random syllables) -> shielded audio cable(s) -> USB input audio interface -> PC -> Maximus plugin (noise gate) in FL Studio -> USB output audio interface -> speaker.
     
    Recommended Setup For Experimenters
    I plan on writing a Python script that does the software steps necessary for this setup. All you will need is 
    1) a cellphone to play the scramble phoneme stream WAV file (we can all use the same one(s) and I'll provide that, too).
    2) A PC desktop or laptop to run the Python script / executable.
    3) A male to male audio cable to connect your phone to the microphone/line input of a laptop.
  12. Michael Lee
    Noise gates are an integral part of most ITC systems. They are a subset of something called expanders, whose job is to expand the dynamic range of a certain ranges of the signal. Below a gate, noise is attenuated. Above the gate, the signal is amplified to achieve more clarity. 
    If you listen to the raw sound of a typical entropy / noise source, it's sounds pretty boring, as if there's nothing interesting or "paranormal" going on. However, when you expand or noise gate the signal, you emphasize the slight variations from random and presumably the weak signals from spirit. 
    Most of the time, gating can be performed in software, as the first in a chain of effects. It can also be performed in hardware using a noise gate guitar pedal. 
    Typical the noise gate works as follows. It first waits for voltage or samples above a user-defined threshold. When these "spikes" are detected, a "gate" is opened which allows sound through for a pre-determined period of time, before closing the gate. The gate can either allow sound above a second threshold or all sound during the open phase.
    In my scripts, I wrote my own gate for a 1.024-second clip, which first detects all of the "spikes" above a threshold, then convolves those spikes to a window function (~100 ms). The resultant window is then sample-by-sample multiplied with the original clip.
    In addition, if the noise-gated signal sounds too choppy because the gate is short (<50 ms), we can apply time-stretching techniques to make the spikes sound more realistic. The simplest idea is to convolve the signal with an all-pass filter, which is similar to adding reverberation (room echo).
    Attached is a Python script that allows you to select input and output devices, and then interactively control a noise gate on whatever noise / phoneme source you like. I also provide a Windows executable of Noise Gater for people who don't want to install Python and the dependent modules.
    From recent tests, I have determined that this gate is more sensitive than the Neewer noise gate guitar pedal that is often recommended. However, the software gate is not 100% real-time, it is always time lagged by 2 seconds.
    Finally, to get you started. I also have a Scrambled phoneme file (150 ms) that you can play with your cellphone into your PC audio input. Keep the volume low (25%) for best results. You need a mix of noise (affected by spirit) and recorded audio for gating to work, otherwise you'll just be gating the same recorded audio over and over again. 
    noise_gater.py
  13. Michael Lee
    If you're doing direct voice ITC, you'll probably be wanting denoise the signals your capturing. The goal of denoising is to remove noise from a voice signal, or equivalently enhance the non-noise, or speech that may be embedded in a hardware noise source.
    Of all of the methods for denoising a signal, spectral subtraction is the oldest, and most well-known. As the term, spectral, would imply, it involves converting a time-based audio stream into a frequency-based (spectral) vector using the Fourier transform. 
    First we assume, that the desired signal that we are trying to restore, X, is corrupted by an additive noise source, N, such that the resultant, observed signal is Y,
    Y(t)=X(t)+N(t).
    Since N is random and unknowable ahead of time, we can't subtract it from the observed signal, Y. However, in frequency space, we can approximately subtract the noise, given knowledge of the noise's average frequency/power spectrum,
    X(f) =|Y(f)|-|N(f)|.
    Simply put, compute the frequency spectrum of the observed signal and subtract it by a constant amount in each frequency (equal to the estimated noise at that frequency), then return this result to time-space. When a value of X(f) ends up below zero, it is simply set to zero.
    One challenge is knowing the noise's frequency spectrum. This can be estimated by taking the frequency spectrum of a part of the observed signal that is known to only contain noise, and no voice. This of course is not trivial, given the hypothesis, that spirit speech permeates almost continuously in the noise.
    One simplification is to use a hardware noise source that is white, aka, all of the frequencies, on average are the same. This allows us to avoid computing the difficult noise frequency term.
    There are many papers on spectral subtraction in the scientific literature, to help you understand the method better. The one caveat, is that the method is usually not applied to cases where the noise overwhelms the weak signal. When that happens, the resultant denoised signal usually sounds like discordant musical tones. Discordance is partially due to the tones are linearly spaced (like a Fourier transform) and not exponentially spaced (like the notes on a musical scale).
    Figure 1 have spectrograms demonstrating spectral subtraction, where I added the same magnitude of white noise as the speech signal (a real physical voice). The left picture is the original clean speech. The middle picture has added white noise. The right picture is the attempted denoising.
    Notice, only the lower harmonics are still visible in the reconstruction. The higher frequency formants are missing - this is a perennial problem with direct voice. 

     
  14. Michael Lee

    Methods
    Up until the last few years, the only main techniques for removing noise from signals were based on spectral subtraction. Machine learning (ML) has now become a powerful alternative. It takes advantage of the fact that we know what the denoised signal should roughly sound like. I have a paper on this topic here (I'll add a paper download link). 
    The general principle is we train the ML to convert (noise + speech) -> speech. I use a database of 140,000 seconds of "books on tape." I add random noise to 1.024 second clips of speech and then ask the ML to reverse or remove the noise. 
    What I've found is that I can remove white noise that is up to 3x louder than the underlying speech signal. Unfortunately, based on listening carefully to hardware-produced white noise, I feel that spirit voices are at least 20x quieter than the background noise. So, if I make a model that can remove 3x white noise, I must apply this model multiple times to a white noise source, and after a few iterations, it'll produce something akin to human speech. In fact, I'm fairly certain this method works - however, the voice quality is not clear enough such that when sharing audio clips with other researchers, we generally can't agree on much of what is being said. Simply put, removing adding noise is not enough to solve the clear speech ITC problem; however, it may be enough for a dedicated ITC researcher to work with.
    Another major drawback with ML, is that my models, in their current form, are computationally expensive. It is common to run ML on Graphics Processing Units (GPUs), specifically from Nvidia. I estimate the NVIDIA GTX 1650 is the minimal hardware to run multiple iterations of my ML models in real-time. Budget gaming laptops like the Acer Nitro series that are around $600-$700 have the requisite GPU. An alternative is that we host the machine learning models on the "cloud" to yield a single stream for people to tune into. Once again, someone needs to be willing to spend the money to buy a similar computer and host it 24/7 (electricity, AC, etc.) 
    If you have a ML-capable GPU, and would like to explore ML-based processing for ITC, let me know, and I will share with your my Python scripts and trained model files. The spirits are always excited about expanding "the network."
     
     
  15. Michael Lee
    Pre-built Electronics
    The first noise sources I worked with were generated by pre-made electronics: the USB input audio interface turned up to max gain (+46 dB) and a software-defined radio tuned to no radio station/source. Both of these sources produce nearly white noise. White noise means that all of the frequencies are the same magnitude.
    Both of these sources are probably suitable for noise-gate applications like the phonetic keyboard. However, in order to derive voice directly from noise, I have often hypothesizes that  we would need something more sophisticated.
    Home-made Electronics
    Years ago, I avoided getting into ITC precisely because I didn't feel I had the chops to make electrical ITC circuits that people prescribed. Only about two years ago, did I realize that ITC is as much a software problem as a hardware one, and pre-built noise sources might be sufficient. However, I wanted to go further with hardware noise (entropy) sources.
    I'm very cautious when it comes to electronics. I'm not interested in working with high-power systems because I don't want to start any fires or shock myself. I don't own (yet) a 30V DC adjustable power supply (which BTW, often has a lot of annoying periodic interference noises). So my main two main sources of power, to-date, are a 3 x AA battery (4.5V) power supply and the 48 V phantom power from my USB audio interfaces and mixers. Phantom power is very low current and thus fairly safe, but it can't power too much circuitry.
    Reverse-biased White LED
    One of my earliest hand-made noise sources, which I discovered by accident, but is commonly known, is the reverse-biased light emitting diode (LED). If you apply >30V to a small white LED, you will often, but not always, get pink noise that can be made quite audible with the 200x (46 dB) gain of a USB audio interface. Simply, put a 100 kilo-ohm resistor in series with +48V, pin 2 of an XLR connected to the microphone interface. That resistor hooks to the cathode of the LED. The anode is then connected to ground (XLR Pin 1). Every single LED has different noise characteristics, but if you try, say 10 LEDs, you should find one or two that produce a distinct grumbly pink noise. I've since bought 100's of white LEDs, and find about 30% make good noise. Some LEDs are louder than others. The thinner LEDs tend to work better, but YMMV.
    The phenomenon yielding noise in this setup is known as the avalanche breakdown effect. In layman's terms (and my primitive understanding), the high voltage running in the opposite direction of normal operation for the LED, causes the current to spill over, in a non-deterministic fashion. If you look closely on an oscilloscope, you can sometimes see a random sawtooth pattern. The energy builds up and then randomly collapses producing flicker / pink noise.


    Is it possible spirits can control when the, otherwise random, spill points happen? Nonetheless, the lower triangular power spectrum of pink noise somewhat resembles the spectrum of human speech. Human speech starts at around 75 Hz, build up to 300-500 Hz and then decays to 5-6 kHz with a slight bump near the high end for sibilants like "s" and "t".
    Reverse-biased NPN transistor
    A similar avalanche effect, and similar setup, can be achieved with a transistor. My favorite device, which I've also bought 100's of, is the N2222(A) NPN transistor. At around 10V, with a reverse bias between the base and emitter leads, white noise can result. This noise, I originally simply amplified, again, with the (up to) 200x gain of my USB interface.
    Arrays of Reverse-biased PN junctions
    The results for the avalanched white LED for direct voice often sounded a little better than what I could get for the avalanched transistor, however I still wanted a better signal-to-noise ratio (SNR). A common method for improving SNR is to use more identical sensors and sum up their signal. The concept is that if the signal is the same in each sensor, it will grow in amplitude linearly with the number of sensors, N. However, if the noise in each sensor is uncoupled, then it should it only accumulate as the square root of N. In total, the SNR should grow as the square root of N. 

    (Illumination for fun. In reverse-biased mode, LEDs don't light up.)
    Maybe in sensors for physical phenomena, this is true, but for picking up spirit signals, it never quite works as well. To be sure, an array of 30-50 avalanched circuit elements produces a better signal than a single element, but 100 or 400 elements in parallel doesn't seem to make much difference. 
    There are a few reasons why this could be the case:
    1) The noise in each element is not completely uncorrelated from each other. If the noise were say ground hum, this would make sense. However, the noise often appears to be random, not interference from other electronic systems in the environment.
    2) Spirits can't equally affect all sensors at once. Interestingly, they often use the term "field" to describe my arrays of LEDs and transistors.
    3) The phantom power gets drained too much from powering multiple elements. This effect can be ameliorated by using more than one phantom power source. For example, I have a cheap 8 8 x XLR mixer that I've tried. Another trick is to up the resistor from 100K to 1M. The downside is the loudness per device is reduced.
    4) After averaging all of the additive noise, there are still other degradations that can't be simply averaged out. This would be true, for example, if spirits were actually modulating the noise to produce voice. In this case, inference algorithms would be needed to "clean up" this effect.
    A white LED array powered by 48V phantom power from a USB audio interface. Notice the lone green LED - other color LEDs can sometimes also generate pink noise.
     
    Massive Arrays
    Some spirits think that if we could get a few 1000 LEDs in parallel, we would reach better clarity. I did buy something called an Avalanche Photodiode (APD). This device contains 4000 or so diodes, and it's job is to detect single photons. The white noise produced sounded very smooth, but the ML-translated spirit voices weren't necessarily that much better. So for now, large arrays for better vocal clarity? The jury is out. 
     
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.