[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SpeechIO-48] string substitution (was: Re: [SpeechIO-12] speechd v0.39)



On Mon, 9 Aug 1999, Kyle Burton wrote:

> "imo animosity is somthign you can't avoid"
> it should expand to:
> "in my opinion animosity is somthign you can't avoid"
> but using that expression, it expands to:
> "in my opinion anin my opinionsity is somthign you can't avoid"

Ewww.  Thank you.  Ugh.  

Wait, you're saying that

 $text =~ s/([a-zA-Z_0-9/]+)/$wordsub{$1} || $1/eg; 

Has the same problem ?  I don't think it does.. it's doing a hash lookup,
not a search.  It'll take any contiguous group of characters in the above
set (in []'s), and do a hash lookup.

if $wordsub{imo} == 'in my opinion', it'll get to "animosity", grab the
whole thing at once (because it's composed entirely of characters in the
set we're looking for) and nothing just before or after it (because that
would be whitespace, which is not in the set), then do a hash lookup, and
since $wordsub{animosity} == "", it'll hit the || $1/eg, and leave it
alone.

Right ?

Then we'd just have to decide which characters to consider parts of words
("/", etc), and which characters not to (periods, commas, quotes, parens,
etc).

Perl is really impressive.  It's cool how many rather elegant
possibilities we have.  The frustration is in the irregularity of our
language :)
If punctuation were always just punctuation, and words were always made up
only of alphabetic characters, this'd be easy :)

> I've been meaning to ask -- since this search and replace is a kind of 
> filtering, should we take this out of speechd and put it into catspeech?
> I mean speechd should be just the FIFO, with minimal overhead...we shouldn't
> force applications to use the filtering -- they could opt to by interfacing
> to catspeech, but shouldn't be forced to.  I don't think we should be that
> fascist...

I did actually think about this before I put it in speechd.  I think this
function deals with an issue that will be commong for all applications
that use speechd.  And it can be disabled (by removing the speechd.sub
file).

> > > $text =~ s/$search/$wordsub{$1}/eg;
> > 
> > Now that one's got serious potential.  Didn't know you could do that.
> > 
> > Looked it up.  I'm getting warm fuzzies.
> 
> Especillay if we use study() on the regex...see the entry for study() in
> the manpage for perlfunc -- we could make it _much_ faster by tinkering with
> making code at runtime (which we're doing anyway with the 'do').

Read over it.  Looks neat.  Still don't fully understand it though.

> > tr [\200-\377]
> >    [\000-\177]; # delete 8th bit
> > 
> > Gotta do that.  Or should we strip all characters over 7 bits ?

> Should we just support multibyte characters instead?  i.e. this is still the
> idea that we should only be a filter between the fifo and the other program.
> Implement stuff like this in catspeech.

Didn't think about that.  The only reason I was thinking they should be
stripped is to avoid like... what's that stuff called ?  Well, tainting
problems.  Is that not the issue that I think it might be ?

> I hope my sudden presence in this project isn't unwelcome...I've got some
> free time, and still have some interest in this project, I know I've been
> absent for a long time...

I'm very happy to have you back :)

__________________________________________________________________
PGP fingerprint = 03 5B 9B A0 16 33 91 2F  A5 77 BC EE 43 71 98 D4
            darxus@op.net / http://www.op.net/~darxus
                         Far Beyond Reason