[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SpeechIO-12] speechd v0.39



> The "d/l" thing can be solved with something like:
> 
>       $text =~ s/([a-zA-Z_0-9/]+)/$wordsub{$1} || $1/eg;
> 
> I'm really rough on my regexese, so I dunno if that syntax is right.  
> BTW, I was incorrect, substitutions currently will work on alphanumeric
> strings, not just alphabetic strings.

This looks like a solution...with an exceptoin that I'll discuss below (as
they both have the same problem).

> The issue is, which characters would we want between the []'s.

Yup.  But also that we only work with 'words'.

> >   while(($k,$v)=each %subs) { s/\b$k\b/$v/g; }
> 
> Thats a really neat way to do a brute force, pretty close to doing a
> foreach $key in %hash.  Although if we were to go with that, I think it'd
> be better to just do:
> 
> while(($k,$v)=each %subs) { s/$k/$v/g; }

Think of this input and what would happen to it:

"imo animosity is somthign you can't avoid"

it should expand to:

"in my opinion animosity is somthign you can't avoid"

but using that expression, it expands to:

"in my opinion anin my opinionsity is somthign you can't avoid"

which is not what I think we want....

I've been meaning to ask -- since this search and replace is a kind of 
filtering, should we take this out of speechd and put it into catspeech?
I mean speechd should be just the FIFO, with minimal overhead...we shouldn't
force applications to use the filtering -- they could opt to by interfacing
to catspeech, but shouldn't be forced to.  I don't think we should be that
fascist...

> > $text =~ s/$search/$wordsub{$1}/eg;
> 
> Now that one's got serious potential.  Didn't know you could do that.
> 
> Looked it up.  I'm getting warm fuzzies.

Especillay if we use study() on the regex...see the entry for study() in
the manpage for perlfunc -- we could make it _much_ faster by tinkering with
making code at runtime (which we're doing anyway with the 'do').

> > "Perl combines all of the worst aspects of BASIC, C and line noise." 
> 
> That is cute :)
> 
> Oh... noise.  
> /me flips through the camel book.
> There it is, page 75, at the bottom:
> 
> tr [\200-\377]
>    [\000-\177]; # delete 8th bit
> 
> Gotta do that.  Or should we strip all characters over 7 bits ?
> 
> (for those not familiar w/ this stuff, anything over 7 bits is generally
> binary data, not text)
> 
> That doesn't look right... shouldn't it be...
> 
> tr [\128-\255]
>    [\000-\127];

Should we just support multibyte characters instead?  i.e. this is still the
idea that we should only be a filter between the fifo and the other program.
Implement stuff like this in catspeech.

I'm out of time today, I can probably look at doing all this to catspeech
tomorrow.


I hope my sudden presence in this project isn't unwelcome...I've got some
free time, and still have some interest in this project, I know I've been
absent for a long time...


k

------------------------------------------------------------------------------
We are the Resistance. Microsoft is useless. 
    -- Anonymous Coward (from http://slashdot.org)
mortis@voicenet.com                            http://www.voicenet.com/~mortis
------------------------------------------------------------------------------