[CSDb] - User Forums - Speech-Synthesis

You are not logged in - nap

CSDb User Forums

Forums > C64 Composing > Speech-Synthesis

2005-09-04 17:18

Bamu®
Account closed

Registered: May 2005
Posts: 1332

Speech-Synthesis

Hi,

There are only few people, who used ring-modulation to simulate Speech....
I even want try it, but unfortunately I don't have any idea how this works
Before I waste time by testing I ask here: Is there someone with knowledge?

Btw: Is there someone who can tell me, what the voice in AGEMIXER'S song "Da Shit (Eastwood Jack's Red..)" says?

... 149 posts hidden. Click here to view all posts....

2014-09-21 19:00

Agemixer
Account closed

Registered: Dec 2002
Posts: 40

Da Shit Eastwood Jack's Red Cocks, Green Hens and Nat Hubs:

Lyrics:
http://skalaria.japo.fi/Da-Shit-Eastwood-Jacks-lyrics.txt

(first turn your browser to View-> character encoding = UTF-8. On firefox, press Alt.)

2014-09-22 06:08

Oswald

Registered: Apr 2002
Posts: 5126

feestyler is still the best imho :)

2014-09-22 08:21

Jammer

Registered: Nov 2002
Posts: 1343

Well, I find it kinda obsolete and complicated in comparison with $57 sync-mod, considering its results ;)

2014-09-22 18:20

algorithm

Registered: May 2002
Posts: 707

As a quick test and some fun, I have used just a single sid channel on the "diner" speech example.

Result as below

https://www.dropbox.com/s/l9am1u8vhmpkl8g/diner1chan.prg?dl=0

Certainly would require more channels to recreate speech more accurately. Below is using 3 sid channels

https://www.dropbox.com/s/l3eih48rnfbw682/diner3chans.prg?dl=0

if using real c64 or emulator, use new sid. Will sound crackly on old sid (although can be rectified by updating and interpolating value in d418)

2014-09-23 02:27

Agemixer
Account closed

Registered: Dec 2002
Posts: 40

Jammer: Hey, 6581-8580.com mirrors seems to be down. Curious. Is there a chance there's an mp3 of the famous "Hot Mommas" recorded on your SID? :) (haven't xferred that sid on real C64 yet, so i have no idea)

Which other tunes would you prefer to be heard, using your $57 wave speech technique?

Haven't tried it myself, but $57 looks like too many variables - how can you predict how the voice behaves? Sounds like thousands of choices though (waveforms*notes*notes*detuning*pulses?) How can you imagine that combined sync+mod waveform? Or did you just try manually until you got the wanted waveform..?

2014-09-23 09:18

Jammer

Registered: Nov 2002
Posts: 1343

On real SID $57 sounds a bit thinner, still it works properly ;) I cannot transfer it right now, but I'll record it for sure. Here are couple of my tunes with $57, however 'Hot Mommas' is my best speech effort so far:

- http://hvsc.perff.dk/MUSICIANS/J/Jammer/For_Jazzcat_2k6.sid - relatively the weakest example but 'Let's go' can be heard with a bit of imagination ;)
- http://hvsc.perff.dk/MUSICIANS/J/Jammer/Galaxy_Bounce_Edit.sid - simulation of original sampled shouts
- http://hvsc.perff.dk/MUSICIANS/J/Jammer/Sweet_Infection.sid attempt at 'Sweet Infection' phrase, you have to wait over 2 minutes, unfortunately ;)
- http://hvsc.perff.dk/MUSICIANS/J/Jammer/Mr_Marvellous.sid - my first reasonable effort in this field
- http://hvsc.perff.dk/MUSICIANS/J/Jammer/HVSC.sid - tune for HVSC' 10 years anniversary with my short article on the technique

Apart from more advanced techniques like Algo's player or Radwar's example, $57 gives IMHO the best and most audible results, yet it sounds more like robotic vocoder rather tan proper speech (well, in 'Hot Mommas' I've managed to make it a bit less robotic thanks to hipass/band at the beginning) ;) $57 is quite simple in use - you just treat base pitch as ... pitch :D and modulation pitch as formant. $57 gives the best results on 6th and 7th octave but it's more a matter of tweaking. Still you can achieve clear vowel sounds such as oo, eh, ae etc. ;) In noise frames you either cease modulation or set it to appropriate value that matches noise sound. I'm pretty sure, you'll make a yet better use out of it, especially that I haven't added dynamic filtering to speech so far ;)

2014-09-23 12:26

Mixer

Registered: Apr 2008
Posts: 460

Just a bit of reasoning out loud here:

57 is a waveform of pulse and triangle, it uses previous voice oscillator frequency as sync modulation source.

Idea: Instead of the voice sync one could do a timer irq sync and reset testbit (do the sync) with any desired frequency to generate the formant frq for speech.

Ringmod still uses the previous voice output.

Using timer syncmod and voice ringmod together allow 2 different frequency modulations on same voice.

It would be interesting to know if it is possible to generate any/better/worse vowels with this method.

(yeah, gotta test this some night)

2014-09-23 12:38

Jammer

Registered: Nov 2002
Posts: 1343

Right but I assume that we are talking about solutions within popular editors, not custom pieces of code ;)

2014-09-23 12:59

Mixer

Registered: Apr 2008
Posts: 460

Jammer, you're right on that.

Time to upgrade them editors, so the new stuff is available for all musicians.

If only the general attitude would change from calls per frame thinking to free timing/bpm. Most tunes won't be heard in demos or games that require per frame sync anyway so "calls per frame" thinking is unfounded.

However, this discussion is about speech, pardon my derailing.

2014-09-23 13:04

Agemixer
Account closed

Registered: Dec 2002
Posts: 40

Maybe I will include a small assembler in AgentRacker then hehehe...