[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [SpeechIO-12] speechd v0.39
> The "d/l" thing can be solved with something like:
>
> $text =~ s/([a-zA-Z_0-9/]+)/$wordsub{$1} || $1/eg;
>
> I'm really rough on my regexese, so I dunno if that syntax is right.
> BTW, I was incorrect, substitutions currently will work on alphanumeric
> strings, not just alphabetic strings.
This looks like a solution...with an exceptoin that I'll discuss below (as
they both have the same problem).
> The issue is, which characters would we want between the []'s.
Yup. But also that we only work with 'words'.
> > while(($k,$v)=each %subs) { s/\b$k\b/$v/g; }
>
> Thats a really neat way to do a brute force, pretty close to doing a
> foreach $key in %hash. Although if we were to go with that, I think it'd
> be better to just do:
>
> while(($k,$v)=each %subs) { s/$k/$v/g; }
Think of this input and what would happen to it:
"imo animosity is somthign you can't avoid"
it should expand to:
"in my opinion animosity is somthign you can't avoid"
but using that expression, it expands to:
"in my opinion anin my opinionsity is somthign you can't avoid"
which is not what I think we want....
I've been meaning to ask -- since this search and replace is a kind of
filtering, should we take this out of speechd and put it into catspeech?
I mean speechd should be just the FIFO, with minimal overhead...we shouldn't
force applications to use the filtering -- they could opt to by interfacing
to catspeech, but shouldn't be forced to. I don't think we should be that
fascist...
> > $text =~ s/$search/$wordsub{$1}/eg;
>
> Now that one's got serious potential. Didn't know you could do that.
>
> Looked it up. I'm getting warm fuzzies.
Especillay if we use study() on the regex...see the entry for study() in
the manpage for perlfunc -- we could make it _much_ faster by tinkering with
making code at runtime (which we're doing anyway with the 'do').
> > "Perl combines all of the worst aspects of BASIC, C and line noise."
>
> That is cute :)
>
> Oh... noise.
> /me flips through the camel book.
> There it is, page 75, at the bottom:
>
> tr [\200-\377]
> [\000-\177]; # delete 8th bit
>
> Gotta do that. Or should we strip all characters over 7 bits ?
>
> (for those not familiar w/ this stuff, anything over 7 bits is generally
> binary data, not text)
>
> That doesn't look right... shouldn't it be...
>
> tr [\128-\255]
> [\000-\127];
Should we just support multibyte characters instead? i.e. this is still the
idea that we should only be a filter between the fifo and the other program.
Implement stuff like this in catspeech.
I'm out of time today, I can probably look at doing all this to catspeech
tomorrow.
I hope my sudden presence in this project isn't unwelcome...I've got some
free time, and still have some interest in this project, I know I've been
absent for a long time...
k
------------------------------------------------------------------------------
We are the Resistance. Microsoft is useless.
-- Anonymous Coward (from http://slashdot.org)
mortis@voicenet.com http://www.voicenet.com/~mortis
------------------------------------------------------------------------------