Speech Recognition HOWTO

Stephen Cook

                scook@gear21.com
            

 - {

    htakashi@yabumi.com
  

Revision History                                                        
Revision v2.0        April 19, 2002              Revised by: scc        
Changed license information (now GFDL) and added a new publication.     
Revision v1.2        February 5, 2002            Revised by: scc        
Added more commercial software listings (sent by Mayur Patel).          
Revision v1.1        October 5, 2001             Revised by: scc        
Added info for Vocalis Speechware. Fixed/Updated various other items.   
Revision v1.0        November 20, 2000           Revised by: scc        
Added info on L and H and HTK                                           
Revision v0.5        September 13, 2000          Revised by: scc        
Initial HOWTO Submission                                                

Linux ł̎F (ASR) ȒPɂȂ܂. J҂ł
[Uł\Ȃ̂܂. ̕ł, F̊bƓ
\ȉF\tgEFAɂċLq܂.



Table of Contents
1. @IȒ
   
    1.1. 쌠/CZX
    1.2. Ɛ
    1.3. W
   
2. Ou
   
    2.1.  
    2.2. ӎ
    2.3. Rg/ŐV/tB[hobN
    2.4. ToDo
    2.5. 
   
3. ͂߂
   
    3.1. F̊b
    3.2. F̃^Cv
    3.3. p@Ɖp
   
4. n[hEFA
   
    4.1. TEhJ[h
    4.2. }CN
    4.3. Rs[^/vZbT
   
5. F\tgEFA
   
    5.1. t[\tgEFA
    5.2. p\tgEFA
   
6. F̓
   
    6.1. ǂ̂悤ɔFĂ邩
    6.2. fBW^I[fBI̊b
   
7. oŕ
   
    7.1. 
    7.2. C^[lbg
   
8. {ɂ

1. @IȒ

1.1. 쌠/CZX

(: c܂.)

Copyright (c) 2000-2002 Stephen C. Cook. Permission is granted to copy,
distribute, and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.1 or any later version published by
the Free Software Foundation.

̕Free Software Foundation s GNU Free Documentation
License  1.1 邢͂ȍ~̃o[W̏̂Ƃ, , zz,
C邱ƂĂ܂.

This document is made available under the terms of the GNU Free
Documentation License (GFDL) <http://www.gnu.org/copyleft/fdl.html>,
which is hereby incorporated by reference.

̕ GNU Free Documentation License (GFDL) <http://www.gnu.org/
copyleft/fdl.html>̏̌œ\ɂȂĂ܂. ŃCZX
ւ̃N, Yt̂Ƃ܂.



1.2. Ɛ

(: c܂.)

The author disclaims all warranties with regard to this document,
including all implied warranties of merchantability and fitness for a
certain purpose; in no event shall the author be liable for any
special, indirect or consequential damages or any damages whatsoever
resulting from loss of use, data or profits, whether in an action of
contract, negligence or other tortious action, arising out of or in
connection with the use of this document.

҂, SĂ̏sׂ\ł邱Ƃ̈Öق̕ۏ, ړI֓K邱
Ƃ܂߂Ă̕ɊւSĂ̕ۏ؂܂; ǂ̂悤ȏo
Ă, ̎̕gpƂ̌p̓OŋN, K̒̊, Ӗ
邢͑̕s@sׂɂ̂ł낤, ʂ, ԐړI܂,
ʓIȑQgp, f[^, v̑ɂ鑹QȂǂɑ΂č҂͐ӔC
𕉂܂.



1.3. W

̕Ɋ܂܂SĂ̏W͂ꂼ̏L҂̒쌠/o^Wł.



2. Ou

2.1.

͉̕F̊wKɋ, Ă݂悤ƂĂ鏉璆
x Linux [U^[QbgɂĂ܂. ܂, J
҂̂߂ɉFɊւvO~O̊bɂĂ܂.

ǂ̂悤ȉF\tgEFAƊJp̃Cu Linux Ŏgpł
̂𒲂׎n߂Ƃɂ͂̕߂܂. Linux ł̎
F (ASR ܂͒P SR) ͂傤ǖ{̂𔭊͂߂Ƃ, ̕
Ő֌㉟ł邱ƂĂ܂ - ASR Zp̃[UƊJ
҂̗T|[g邱Ƃ.  

̕ SR ̋ZpɂĂ͐GĂ܂, ̑ "HOWTO" Ƃ
ʂɏWĂ܂ ( HOWTO łc). ŃJo[łĂ
ƂɂĂ, ǎ҂{LT悤ɏoŕ̐߂
pӂ܂. ꂪLinux  ASR ɂĂ̍ŏIIȕ񍐂ƂƂł
܂.

̍̕ŐVł, LDP ̃A[JCu`FbN邩, http://
www.gear21.com/speech/index.html肵Ă.



2.2. ӎ

̕, Ăȉ̐lXɊӂ܂:

 E Jessica Perry Hekman
   
 E Geoff Wexler
   


2.3. Rg/ŐV/tB[hobN

Rg, , , ŐV񂪂, ܂,  ASR ɂă`b
gƂ, ̃AhX scook@gear21.com <mailto:scook@gear21.com>
 Email .



2.4. ToDo

ȉ̂Ƃ "to do" ƂĎcĂ܂:

 E oŕ̐߂ɐ.
   
 E oŕ̐߂ɂ葽̖{.
   
 E 葽̃Ntŉ.
   
 E ASR VXe̎菇ɂĂ̐[.
   
 E FFT ƃtB^[̐.
   
 E DSP ̌̐.
   


2.5. 

v0.1 ŏ̑ 2000N 8

v0.5 ŏI 2000N 9



3. ͂߂

3.1. F̊b

FƂ, Rs[^ (邢͑̃^Cv̋@B) btF
鏈ł. {Iɂ, Rs[^ɌĘb, ̏, ̌t
Rs[^ɐFƂӖł.

ȉ̒`͉F̋Zp𗝉邽߂ɕKvȊbł.

b
   
    b, 1̈Ӗ\P₢̌tRs[^Ɍ
     (b) Ƃł. b͒Pł, tł, 
    ł, 邢͕̕ł肵܂.
   
b҂ւ̈ˑ
   
    b҂ɈˑVXe͓̘b҂ΏۂƂĐ݌v܂. ̃V
    Xe͈ʂ, ̘̓b҂̔ɑ΂Ă͐mł, ̘b
    ł͐xƈȂ܂. ͘b҂̐ƑxŘb
    Ƃ肵Ă܂. b҂ɈˑȂVXe͗lXȘb҂ɌĐ
    v܂. ̂VXe͕, b҂ɈˑȂVXe
    ăX^[g, wKZp𗘗pĔFx߂邱ƂŘb҂ɓK
    Ă܂.
   
b
   
    b (邢͎) Ƃ, SR VXeɔF邽߂̌t┭b
    Xgł. ʂ, Rs[^ɂƂĂ͏Ȃb̂قF
    ₷, bȂقǔFɂȂ܂. ʂ̎Ƃ͈ق
    , ꂼ̍ڂ͒Pł͂܂. ͕╶͂قǒ
    邱Ƃ܂. Ȃb12̔Fꂽ (Ⴆ "Wake
    up") Ȃ܂, ƂĂbł 10 ȏƂȂ
    ܂.
   
x
   
    Fu̔\͂͂̐x𑪒肷邱Ƃɂ, 邢͂܂, b
    ꂽtǂꂭ炢F邩ɂĒׂ邱Ƃł܂. 
    b𐳊mɓ肷邾łȂ, bbɊ܂܂Ă邩ǂ
    肷邱Ƃ܂ł܂. ǂ ASR VXe 98% ȏ̐x
    ܂. VXe̐x̋e͈̗͂͂prɋˑ܂.
   
wK
   
    b҂ɏ\͂F܂. VXe̔\͂
    ĂƂ, wK邱Ƃł܂. ASR VXe͘b҂ɕW
    IȌtʓIȌtJԂ, r̃ASY̘b
    ɒa邱ƂŊwK܂. ʂɔFuwK邱Ƃ, 
    ̐x͌サ܂.
   
    wK, b₠̒P̔܂łȂb҂ɂp
    ܂. b҂тĔbJԂ, wK@\̂ ASR VX
    e͓K邱Ƃ\ł傤.
   


3.2. F̃^Cv

F̃VXe, ǂ̂悤ȃ^Cv̔bF\͂Ă
ɂ, ̃NXɕނ邱Ƃł܂. ̂悤ȃNX
b҂bn, Î𑪒肷\͂ ASR ̓ 1
łƂɊÂĂ܂. ̃pbP[Wgp̃[hɂ
, ̃NXɓK܂.

Ǘt
   
    Ǘt̔Fɂ, ꂼ̔bƂɃTvEBhE (T
    v̊JnI̊)̑Oɉ̂Ȃ (I[fBIM̖
    )KvƂȂ܂. FuP󂯎Ƃ킯łȂ
    xɂ͔b͈ƂӖł. ̃VXeł͕ʂȂ̂ł,
    ``͏ / F'' Ƃ 2 ̏Ԃ邽߁Cb҂͂Ƃ
    ƂɘbȂ΂Ȃ܂ (ƂꂽƂɔF
    ܂). Ǘb͂̃NXł͂ǂO܂.
   
At
   
    At (邢͂萳m 'Ab') ̃VXe͌Ǘ
    t̃VXeɎĂ܂, ԂɍŒZ̋x~݂͂Ȃ '
    Ĕ' ʂ̔bF܂.
   
A
   
    AF̃Xebvł. AFł鑕u͍ł
    ɂ̂ł, ȂȂ甭b̋E肷邽߂ɓȕ@
    gpȂ΂ȂȂł. AFu̓[Uɂق
    ǎRɘbƂ܂, ŃRs[^e肵܂.
    {I, ̓Rs[^̏ł.
   
Rȉ
   
    ۂɎRȉł邩̒`͂܂܂悤ł. {I
    iKł, ͎Rȉ̔łČJԂ̂ł͂ȂƂ
    lł邩܂. Rȉ̋@\ ASR VXe
    "ums"  "ahs" Ȃ, ꂽtȂǂ̂܂܂ȎR̉̓
    , Ȍ肳, Ƃ\ł傤.
   
ƍ/
   
     ASR VXẽ͓[Uʂ@\Ă܂.
    ̕ł͏ƍZLeB̂߂̃VXeɂĂ͈܂.
   


3.3. p@Ɖp

Rs[^ƐlԂ𒇉dSʂɂ, ASR ̏oԂ邩
܂. ݂͉LɋAvP[VʓIł.


   
    , łʓI ASR VXe̎gp@ł. ͈ʂ
    ƓlɈwL^]ʂ, @d̏܂݂܂. V
    Xe̐xコ邽߂, ʂȌbgꍇ܂.
   
߃VXe
   
    Rs[^̃R}hs ASR VXêƂ, ߃VX
    eƒ`܂. "Open Netscape"  "Start a new xterm" ̂悤ɉ
    Ŗ߂, bǂ̃R}hs܂.
   
db
   
     PBX/Voice [VXe, {^ɃR}h
    bƂœdb܂.
   
gы@
   
    ͎i肳Ăgы@ł, bƂ͓R\ł.
   
/nfBLbv
   
    ̐l, ^ߑ (RSI), ؃WXgtB[Ȃǂ̂悤Ȑg
    ̓IȐ̂߂Ƀ^CsOɖĂ܂. ႦΒoɖ
    ̂l, ̐eLXgɕς邽߂ɓdbɐڑꂽVXe
    gpłł傤.
   
gݍ݃AvP[V
   
    Vgѓdb̂Ȃɂ "Call Home" ̂悤Ȕb߂ C&C 
    F̂܂. ͏ ASR  Linux ̎v
    ƂȂ邩܂. Ȃ͂܂erɘbȂ̂ł
    ?
   


4. n[hEFA

4.1. TEhJ[h

͔rIႢш敝KvƂ̂, x獂i 16 rbgT
EhJ[hȂgł傤. J[lŃTEhLɂĐh
CoCXg[Ȃ΂Ȃ܂. TEhJ[hɂĂ̂
ȏ̏ http://www.LinuxDoc.org/ ɂ "The Linux Sound HOWTO" 
Ă. TEhJ[h̕iɂĂ͐xƃmCỶeɂ,
΂΋c_܂N܂.

łY A/D (AiOfBW^) ւ̕ϊ@\TEhJ
[h߂܂, ΂΃fBW^Tv̖Ă̓}CN̐\Ɉ
, ͂̃mCYɂ͂傫ˑ܂. j^, PCI Xbg,
n[hfBXNȂǂ̓dCMIȃmCY͂ӂ, Rs[^̃t@
֎q̂މ, ċz畷mCYɔׂďȂ̂ł.

ASR \tgEFApbP[Wɂ͓̃TEhJ[hKvƂ̂
܂. ̃n[hEFAւ̈ˑ̂͒ʏǂƂł, Ȃ
珫̑I߂Ă܂ł. K؂ɓ삷邽߂ɂ͓ʂ
n[hEFAKvƂȂ悤ȃpbP[WlĂ̂Ȃ, Ȃ͗
vƃRXglȂ΂ȂȂł傤.



4.2. }CN

}CN̕i ASR gŏdvł. ̏ꍇɂ, }CN
̎gp@Ɍ܂. ͂̃mCYEɂȂ̂, ASR vO
܂삵ȂƂɂȂ܂.

}CNƎÂĂ̂͑ςȂ̂, nh}CNőP̑I
ł͂܂. ͂̃mCY̗ʂ}Ȃ, pɂɘb҂ςꍇ
FuɌĘbƂ܂Ȃꍇ (wbhZbgt邱Ƃ
IłȂƂ) ł֗ł.

fRlĈԂ悢I̓wbhZbgł. g, 
̌ƂɃ}CNu܂܂ł, ͂̑ŏɗ}邱Ƃł
܂. wbhZbg̓CAẑ̖Ƃ (mXeI)
܂. XeĨwbhz߂܂, ͌l̍D݂̖
.

$25  $100 炢őf炵\wbhZbg^}CN
܂. http://www.headphones.com  http://www.speechcontrol.com T
Ă݂Ă.

xɂĂ̒Z: }CÑ{[グ邱ƂYȂ
.  XMixer  OSS Mixer ̂悤ȃvOgčsȂ
Ƃł܂, ătB[hobNmCY悤Ɏgp邱Ƃɒ
ӂĂ. ASR \tgEFA߃vO܂ł, 
ɎgĂ, ͂̓̔FVXeɍœK
Ă܂.



4.3. Rs[^/vZbT

ASR AvP[V̓vZbT̑xɋˑ邱Ƃ܂. 
 ASR ł͑ςȗʂ̃fBW^tB^OƐMN肤邩
ł.

CPU ׂ̍\tgEFAƓ, قǗǂȂ܂. ܂, 
傫悭Ȃ܂.  ASR  100MHz  16MB  RAM ł
\ł, ŏ (傫Ȏ╡GȔFXL[, Tv
[g) ɂ, Œł 400MHz  128MB  RAM ǂł傤. KvƂ
\̊֌W, قƂǂ̃\tgEFAł͍ŏ̕KvLڂ
Ă܂.

K͂̔FsȂ̂, NX^ (Beowulf ⑼̂) 𗘗p邱
͍sȂĂ܂. isJ̃vWFNgmȂ炨m点
. scook@gear21.com <mailto:scook@gear21.com>



5. F\tgEFA

5.1. t[\tgEFA

ŋt[\tgEFȂ, _E[hł܂:
http://sunsite.uio.no/pub/Linux/sound/apps/speech/



5.1.1. XVoice

XVoice ͂܂܂ XWindow AvP[VŎgpł鉹F̃\t
gEFA, AF\ł. [U}N`
邱Ƃł, mȖ̂ǂvOł. xݒ肷, [
Ȑxœ삵܂.

XVoice g߂ɂ IBM  ViaVoice for Linux (p̐߂Ă
) 肵ăCXg[Kv܂. ܂ ViaVoice 𐳂
삳邽߂ɐݒ肪Kvł.  Lesstif/Motif (libXm) Kvł.
̃vO X Window ƂƂ肷̂, X \[X𗘗pł
ɂĂȂ΂ȂȂƂɒӂ邱Ƃdvł, ̂, l
bg[NɌp}V}`[Ũ}VŎgpƂ, C
tĂ.

̃\tgEFA͎Ƀ[Uł. RPM ł܂.

HomePage: http://www.compapp.dcu.ie/~tdoris/Xvoice/ http://
www.zachary.com/creemer/xvoice.html

Project: http://xvoice.sourceforge.net

Community: http://www.onelist.com/community/xvoice



5.1.2. CVoiceControl/kVoiceControl

CVoiceControl (Console Voice Control ̗) ͌X KVoiceControl(KDE
Voice Control) ł. ̃vO̓[UR}hbƂ
Linux ̃R}hsł, {IȉFVXeł.
CVoiceControl  KVoiceControl ɒu܂.

̃\tgEFAɂ̓}CNxݒ肷郆[eBeB, VR}
hƔbǉ邽߂̌bfGfB^, FVXe܂܂
܂.

CVoiceControl  ASR n߂悤ƂoLxȃ[UɂƂ, f炵
o_ƂȂ܂. K[UthłƂ͌܂, 
wK, ƂĂ𗧂܂. ZbgAbvsɂ̓hL
gǂǂł.

̃\tgEFA͎Ƀ[Uł.

Homepage: http://www.kiecza.de/daniel/linux/index.html

Documents: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html



5.1.3. Open Mind Speech

1999 N㔼Ɏn܂ Open Mind Speech ͉xOς܂ (
 VoiceControl, ̌ SpeechInput , ꂩ FreeSpeech ł). 
ł, I[v\[XvWFNg "Open Mind Initiative" ̈ꕔ
. ̏͊Sɋ@\킯ł͂Ȃ, ɊJҌł.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

Homepage: http://freespeech.sourceforge.net



5.1.4. GVoice

GVoice  Gtk/GNOME AvP[V𐧌䂷邽߂ IBM  (t[)
ViaVoice SDK gp ASR Cu, , FGW, 
, pl̃Rg[s߂̃Cu܂܂Ă܂. J
͈Nȏ؂Ă܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

Homepage: http://www.cse.ogi.edu/~omega/gnome/gvoice/



5.1.5. ISIP

Mississippi State University  Institute for Signal and Information
Processing ͂̉FGWJ܂. ̃c[Lbg̓t
gGhƃfR[_[, ČPW[܂ł܂. ͋@\
Iȃc[Lbgł.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

̃c[Lbg ( ISIP ɂĂ̏) ͂œł܂: http://
www.isip.msstate.edu/project/speech/



5.1.6. CMU Sphinx

Sphinx ͂Ƃ CMU Ŏn߂, ŋ߃I[v\[XƂČJ܂
. ͑̃c[Ə܂, Ȃ傫ȃvOł. 
͂܂"J"ł, wK̂߂̃\tgEFAƔFu, f
, ꃂf, 쐬̕܂ł܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html

Source: http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz



5.1.7. Ears

Ears ̊J͊Sł͂܂, ASR n߂ƎvĂvO}
ɂ͗ǂɂȂł傤.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

FTP site: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/



5.1.8. NICO ANN Toolkit

NICO Artificial Neural Network toolkit͉FAvP[VɍœK
ꂽtLVuobNvpQ[Vj[lbg[Nc[
Lbgł.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

homepage: http://www.speech.kth.se/NICO/index.html



5.1.9. Myers' Hidden Markov Model Software

Richard Myers ̂̃\tgEFA C++ ŋLqꂽ HMM ASY
.  L. Rabiner ̖{ł "Fundamentals of Speech Recognition"
ɋLqꂽ HMM ̂߂̗ƊwKc[񋟂܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

͂œł܂: http://www.itl.atr.co.jp/comp.speech/Section6
/Recognition/myers.hmm.html



5.1.10. Jialong He's Speech Recognition Research Tool

Ƃ Linux ɏꂽ̂ł͂܂, ̌c[
Linux ŃRpCł܂. قȂ3̃^Cv̔FuĂ܂:
DTW, Dynamic Hidden Markov Model, Continuous Density Hidden Markov
Model ł. ͌ƊJp̂̂, S ASR VXeł͂܂
. ̃c[Lbg͂֗̕ȃc[Ă܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

ɑ̏͂œł܂: http://www.itl.atr.co.jp/
comp.speech/Section6/Recognition/jialong.html



5.1.11. ܂ɂ܂?

LȊÔ̂Ȃ玄܂łm点: scook@gear21.com
<mailto:scook@gear21.com>. 낵, Љ\tgEFA
Rs[łꏊĂ. ɊzĒƍK
ł.



5.2. p\tgEFA

5.2.1. IBM ViaVoice

SDK ̖͂ǂȂ邩킩܂, IBM  ViaVoice V[Y Linux
T|[gƂ񑩂Ă܂, (J҂Ƃ̃CZX_͌_
ł͌ɂ͍sĂ܂, ɂȂł傤. )

p (t[łȂ) ił, IBM ViaVoice Dictation for Linux
(http://www-4.ibm.com/software/speech/linux/dictation.html ł
܂) ̐\͂ƂĂǂ̂ł, {I ASR VXe (64M RAM 
233MHz Pentium) ɔrĂɑ傫ȃVXeKvƂ܂. $59.95US
 Andrea NC-8 }CNtĂ܂. }`[UŎgp邱Ƃ\
ł (, ̓}`[UŎĂȂ̂, ꂩl
Ύ̂@Ă). ̃pbP[Ŵ͎̂܂݂܂:
 (PDF), wKc[, VXe, ꂩCXg[XNv
g. 2.2nJ[lx[Xɂ Linux fBXgr[ṼT|
[gŐṼ[Xł͂Ă܂.

 ASR SDK ͎Rɓł, IBM  SMAPI, @ API, , ƗlXȃT
vvO܂ł܂. ViaVoice Run Time Kit ͏@\
߂ ASR GWƃf[^t@C, [U[eBeB񋟂܂.
 ViaVoice Command & Control Run Time Kit ͉߃VXê߂
ASR GWƃf[^t@Cƃ[U[eBeB܂ł܂. 
SDK  Kit ɂ 128MB  RAM  Linux 2.2 ȏオKvł.

SDK  Kit ͂ŎRɓł܂: http://www-4.ibm.com/software/
speech/dev/sdk_linux.html



5.2.2. Vocalis Speechware

Vocalis  Vocalis Speechware ɂĂ̂Ȃ: http://
www.vocalisspeechware.com  http://www.vocalis.com. 



5.2.3. Babel Technologies

Babel Technologies  Babear ƌĂ΂ Linux SDK 񋟂Ă܂. 
 Hybrid Markov Model  Artificial Neural Network eNmWɊ
b҂ɈˑȂVXeł. eLXgϊbҏƍ, f
Ɋւ邳܂܂Ȑi܂. ̏ɂĂ: http://
www.babeltech.com.



5.2.4. SpeechWorks

ނ̃EFuTCgł Linux ɂēɌyĂ܂, ނ
"OpenSpeech Recognizer" ̓I[vX^_[hł VoiceXML gp
Ă܂. ̏ɂĂ: http://www.speechworks.com.



5.2.5. Nuance

Nuance ͂܂܂ *nix vbgtH[p̉F/R̐i
(݂ Nuance 8.0) 񋟂Ă܂. ɑ傫ȌbƂ\
ŃXP[reBƏQê߂ɓL̕UA[LeN`gp
܂. ̏͂ł܂: http://www.nuance.com.



5.2.6. Abbot/AbbotDemo

Abbot ͔ɑ傫Ȍb, b҂ɈˑȂ ASR VXeł. ͂
, Cambridge University  Connectionist Speech GroupɂĊJ
, ܂, SoftSound (p)ɈڂĂ܂. 킵: http://
www.softsound.com

AbbotDemo  Abbot ̃fpbP[Wł. ̃fVXe͖ 5000 
b, connectionist/HMM ̘AASY𗘗pĂ
. ̓\[XR[h̕ȂfvOł.



5.2.7. Entropic

Entropic ̎ӂ̗L\Ȑl Micro$oft ɔĂ܂܂. . .
iƃT|[gT[rX͑SďĂ܂܂. HTK  ESPS/waves+ ̃T
|[g͑ł؂Ă܂, ނ̖ M$ ɂĂ܂. ÂEF
uTCg http://www.entropic.com ɂɏ񂪂܂.

K.K. Chin  HTK ̌X̊J (Cambridge  Speech Vision and
Robotic Group) ܂ɑ΂T|[gĂƏ܂. 
http://htk.eng.cam.ac.ukł̓t[ȃo[Wł܂. Microsoft
s HTK ̃R[h̒쌠LĂ邱Ƃɂ͒ӂĂ.



5.2.8. ̏pi

葽̏p ASR i (L&H ܂߂) ߂ł悤ɂȂ
Ƃ\܂.  Comdex 2000 (Vegas)  L&H ̑\ 2,3 lƘb
܂, N Linux [Xɂ, ܂ Linux ɂǂ̐ĩ
[Xv悵̂ɂĂ܂ł. ȏ
Ă, ڍׂ scook@gear21.com <mailto:scook@gear21.com>
ɑĂ.



6. F̓

6.1. ǂ̂悤ɔFĂ邩

FVXe 2 ̎ȕɕł܂. p^[FVXe͓K
𔻒f邽߂Ƀp^[m̂̂wKp^[Ɣr܂.
Acoustic Phonetic VXe͉̓ (ꉹȂǂ̉Ȃǂ̉) r
邽߂ɐl̂Ɋւm (̐, ƒo) 𗘗p܂. قƂǂ
IȃVXe͂̂悤ȃp^[FAv[`ɏd_uĂ܂,
ȂȂ. ݂͌̃Rs[^pZpƂ܂т, x
₷ł.

قƂǂ̔Fu͈ȉ̂悤ȒiKɕł܂:

 1. I[fBI̋L^Ɣb̌o
   
 2. vtB^O (vGt@TCY, K, ofBOȂ)
   
 3. t[~OƃEBhEBO (f[^gpł`ɕ)
   
 4. tB^O (Xɂꂼwindow/frame/freq.bandtB^
    O)
   
 5. rƓK (b̔F)
   
 6.  (Fꂽp^[Ɋ֘A@\s)
   
ꂼ̒iK͒PɌ܂, ͑̈قȂ (ĂƂ
͊Sɋt) Zp𗘗pĂ܂.

(1) I[fBI/̘^: 낢ȕ@܂. n߂͎͂̃I[f
BĨx (̏ꍇł͉̃GlM[) ^ĂT
vƔr邱Ƃł. I_̔ʂ, b҂ċz₽ߑ, ̖, G
R[Ȃǂ "artifacts" cȂ̂łɍł.

(2) vtB^O: FVXȇ̋@\Ɉˑ, 낢ȕ
@ōsȂ܂. łʓIȕ@, Tv̏̂߂ɈÃI[
fBItB^[gp "Bank-of-Filters" @, (덷)̌vẐ
߂ɗ\@\gp Linear Predictive Coding @ł. قȂ`̃X
yNg͂p܂.

(3) t[~O/EChEBO̓Tvf[^̑傫ɕ
邱Ƃł. ͂΂ step2  step4 ֐i݂܂. ̒iK͕͂
߂ɃTvE (̃J`ƂȂǂ) pӂ邱Ƃ
܂ł܂.

(4) ǉ̃tB^O͂݂킯ł͂܂. ͔r
ƓK̑Oł̂ꂼ̃EBhEɑ΂Ō̏ł. ΂΂
͎Ԃ̔zuƐK\܂.

(5)̔rƓKɂĂ͉\ȋZpʂɂ܂. قƂǂ݂͌̃E
BhEƊm̃Tv̔rKvƂ܂.Hidden Markov Models
(HMM), g, ى, ^㐔̋Zp/ߓ, XyNgc𗘗p
@⎞ԘcȖ@܂. ׂ̂Ă̕@͈v̊mƐx
邽߂ɎgpĂ܂.

(6) ͊J҂]񂾂Ƃł.



6.2. fBW^I[fBI̊b

I[fBI͖{IɃAiOȌۂł. fBW^TvŘ^邱
Ƃ, }CÑAiOMTEhJ[h A/D Ro[^ŃfB
W^Mɕϊ邱Ƃł. }CN삵Ă, g̓}CN̒
̎΂̗vfU, TEhJ[hւ̓d (Xs[Jtɓ삵
ƍlĂ) 𔭐܂. {IɂA/DRo[^͓
Ԋuł̓d̒lL^܂.

̉ߒ̒2̏dvȗvf܂. 1߂ "sample rate", 邢
͂ǂ̂悤ȕpxœdL^̂Ƃ. 2߂ "bits per
second", ǂ̂悤ȐxŒlL^邩Ƃ̂ł. 3߂̗vf
`l̐ (mXeI), ,  ASR AvP[V
ł̓mŏ\ł. ̃AvP[Vł͂̃p[^ɗ\
ݒ肳ꂽlgp, [U͕ɏĂȂύXׂ
ł͂܂. J҂͈قȂlł̃ASYƂ͉̂
邱ƂŌ肷ׂł.

ł, ASR ɂĂ͂ǂ̂悤ȃTv[gǂ̂ł傤? 
͔rIႢш敝 (قƂ 100Hz  8kHz) ł, 8000 samples/
sec (8kHz) ͂قƂǂ̊{I ASR ɑ΂Ă͏\ł. , 
mȍg̏𓾂̂ 16000 samples/sec(16kHz) Dސl
܂. \͂ 16kHz gׂł. قƂǂ ASR Av
P[Vł 22kHz ȏ̃TvO[g͖ʂł.

Ăǂ̂悤Ȓl "bits per sample" (1Tṽrbg) ɂ
ėǂ̂ł傤? 8 bits per sample  0  255 ̊ԂŒlL^
, ̓}CN̑傫 256̒1łƂƂӖ
. 16 bits per sample͐̑傫 65536 ɕ܂. Tv
[glł. r̂߂, yp Compact Disc  44kHz  16 bits
per sampleŃGR[hĂ܂.

gpGR[fBOtH[}bg͐^ȕt邢͕Ȃ
̂悤ɒPłׂł. U-Law/A-Law ASY܂͑̈k@
gp邱Ƃ͕ʉl܂, ȂȂ炻͌vZ\͂̃RXg
, RXgɌ\͂\ɓ邱ƂłȂł



7. oŕ

̃XgɍڂĂȂ̂, ɉقƎvoŕ
, scook@gear21.com <mailto:scook@gear21.com>֏𑗂Ă
.



7.1. 

 E "Fundamentals of Speech Recognition". L. Rabiner & B. Juang. 1993.
    ISBN: 0130151572.
   
 E "How to Build a Speech Recognition Application". B. Balentine, D.
    Morgan, and W. Meisel. 1999. ISBN: 0967127815.
   
 E "Speech Recognition : Theory and C++ Implementation". C. Becchetti
    and L.P. Ricotti. 1999. ISBN: 0471977306.
   
 E "Applied Speech Technology". A. Syrdal, R. Bennett, S. Greenspan.
    1994. ISBN: 0849394562.
   
 E "Speech Recognition : The Complete Practical Reference Guide". P.
    Foster, T. Schalk. 1993. ISBN: 0936648392.
   
 E "Speech and Language Processing: An Introduction to Natural
    Language Processing, Computational Linguistics and Speech
    Recognition". D. Jurafsky, J. Martin. 2000. ISBN: 0130950696.
   
 E "Discrete-Time Processing of Speech Signals (IEEE Press Classic
    Reissue)". J. Deller, J. Hansen, J. Proakis. 1999. ISBN:
    0780353862.
   
 E "Statistical Methods for Speech Recognition (Language, Speech, and
    Communication)". F. Jelinek. 1999. ISBN: 0262100665.
   
 E "Digital Processing of Speech Signals" L. Rabiner, R. Schafer.
    1978. ISBN: 0132136031
   
 E "Foundations of Statistical Natural Language Processing". C.
    Manning, H. Schutze. 1999. ISBN: 0262133601.
   
 E "Designing Effective Speech Interfaces". S. Weinschenk, D. T.
    Barker. 2000. ISBN: 0471375454.
   
̃ICœǂ߂QlXĝ, Institut Fur Phoneti
`FbNقł傤: http://
www.informatik.uni-frankfurt.de/~ifb/bib_engl.html



7.2. C^[lbg

news:comp.speech
   
    Rs[^ƉɊւj[XO[vł.
   
      US: http://www.speech.cs.cmu.edu/comp.speech/
       
      UK: http://svr-www.eng.cam.ac.uk/comp.speech/
       
      Aus: http://www.speech.su.oz.au/comp.speech/
       
news:comp.speech.users
   
    Ɋւ\tgEFÃ[Û߂̃j[XO[vł.
   
      http://www.speechtechnology.com/users/comp.speech.users.html
       
news:comp.speech.research
   
    Ɋ֌W\tgEFAƃn[hEFÂ߂̃j[XO[v
    .
   
news:comp.dsp
   
    fBW^M̂߂̃j[XO[vł.
   
news:alt.sci.physics.acoustics
   
    ̕ŵ߂̃j[XO[vł.
   
DDLinux Email List
   
    Linux ̉F̃[OXgł.
   
      Homepage: http://leb.net/ddlinux/
       
      Archives: http://leb.net/pipermail/ddlinux/
       
Linux Software Repository for speech applications
   
    http://sunsite.uio.no/pub/linux/sound/apps/speech/
   
Russ Wilcox's List of Speech Recognition Links
   
    (excellent) http://www.tiac.net/users/rwilcox/speech.html
   
Online Bibliography
   
    Online Bibliography of Phonetics and Speech Technology
    Publications. http://www.informatik.uni-frankfurt.de/~ifb/
    bib_engl.html
   
MIT's Spoken Language Systems Homepage
   
    http://www.sls.lcs.mit.edu/sls/
   
Oregon Graduate Institute
   
    Oregon Graduate Institute  Spoken Language Understanding ̃Z^
    [ł. J҂ƌ҂ɂƂđf炵ꏊł. http://
    cslu.cse.ogi.edu/
   
IBM's ViaVoice Linux SDK
   
    http://www-4.ibm.com/software/speech/dev/sdk_linux.html
   
Mississippi State
   
    Signal and Information ProcessingɂẴ~VVbsBwJ
    Ɍʂ̏񂪂z[y[Wł. http://
    www.isip.msstate.edu/projects/speech/
   
Speech Technology
   
    ASR \tgEFAƃANZTł. http://www.speechtechnology.com
   
Speech Control
   
    ̃Rs[^VXe. ASRp̃}CN, wbhZbg, C
    Xił. http://www.speechcontrol.com
   
Microphones.com
   
    ASR p̃}CNƃANZTł. http://www.microphones.com
   
21st Century Eloquence
   
    "Speech Recognition Specialists." http://voicerecognition.com
   
Computing Out Loud
   
    ɂ Windows [UɌ̂ł, ǂ񂪂܂. http:/
    /www.out-loud.com
   
Say I Can.com
   
    "The Speech Recognition Information Source." http://www.sayican.com
   


8. {ɂ

{ Linux Japanese FAQ Project s܂. |Ɋւ邲ӌ
 JF vWFNg <JF@linux.or.jp> ɘAĂ.

2.0j

|:
   
     <htakashi@yabumi.com>
   
Z:
   
      JG <jeanne@mbox.kyoto-inet.or.jp>
       
      {_ <hng@ps.ksky.ne.jp>
       
