
荒井 隆行

アライ タカユキ  (Arai Takayuki)


上智大学 理工学部情報理工学科 教授


2008年4月  上智大学理工学部情報理工学科 教授(現在に至る)
2006年4月  上智大学理工学部電気・電子工学科 教授
2000年4月  上智大学理工学部電気・電子工学科 助教授
1998年4月  上智大学理工学部電気・電子工学科 専任講師
1994年4月  上智大学理工学部電気・電子工学科 助手
1994年3月  上智大学大学院理工学研究科電気・電子工学専攻博士後期課程 修了
1991年3月  上智大学大学院理工学研究科電気・電子工学専攻博士前期課程 修了
1989年3月  上智大学理工学部電気・電子工学科 卒業

2003年10月~2004年9月  アメリカMassachusetts Institute of Technology客員研究員
2000年8月、2001年8月、2002年8月、ならびに 2003年10月~2004年9月
       アメリカ Massachusetts Institute of Technology 客員研究員                 
2001年2月  オランダ Max Planck Institute for Psycholinguistics 客員研究員
2000年8月  アメリカ Massachusetts Institute of Technology 客員研究員
1997年1月~1998年3月 / 1998年8月ならびに1999年8月
       アメリカ California 大学 Berkeley 校付属研究機関
        International Computer Science Institute 客員研究員
        アメリカ Oregon Graduate Institute of Science and Technology 客員研究員

音声コミュニケーションに関わる一連の事象は「ことばの鎖(Speech Chain)」と呼ばれ、音声科学・聴覚科学における基本的な概念となっており、その音声コミュニケーションに関して音声科学・聴覚科学、音響学、音響音声学などに関わる科学的側面とその応用に主な焦点を当てて研究を続けてきている。そして、音に関わるあらゆる側面にも研究の範囲を拡大している。カバーする範囲は、次のような幅の広い学際的な研究分野を含む:




  • 高橋慶, 安啓一, 程島奈緒, 荒井隆行, 栗栖清浩
    電子情報通信学会技術研究報告. EA, 応用音響 107(26) 11-16 2007年4月  
    荒井ら[音講論2001; Acoust. Sci. Tech., 2002]は,残響に対して音声明瞭度を改善する定常部抑圧処置を提案し,その有効性を示している.一方,安武ら[電子情報通信学会ヒューマン情報処理研究会, 2005]は,雑音環境下での補聴と実時間処理を念頭に置いた子音強調処理を提案している.我々の最終目的は,定常部抑圧処理を実時間で表現することである.本論文では,母音の抑圧は結果的に子音を強調している点に着目し,2つの先行研究が統一的に扱えるかを考察した.その結果,安武らによる子音強調処理と同じ枠組みで定常部抑圧処理を実現できることを確認した.
  • 網野加苗, 荒井隆行
    日本音響学会研究発表会講演論文集 321-322 2007年3月  
  • 加島慎平, 飯田朱美, 安啓一, 荒井隆行, 菅原勉
    日本音響学会研究発表会講演論文集 271-272 2007年3月  
  • 向 奈津美, 金寺 登, 北口 直, 荒井 隆行
    石川工業高等専門学校紀要 39 51-56 2007年  
    The process of detecting portions involving utterances, which is essential for captioning films, is generally carried out manually by translators at present. Robust methods are inevitable for automatic voice activity detection (VAD) in films involving other irrelevant sound information such as background music. This paper proposes a new feature for automatic VAD. The proposed method utilizes the gradient of spectrum in high-frequency domain (4-6kHz) and the standard deviation of modulation-filtered cepstrum. For evaluation experiments, we used a portion (about 23 minutes) of an English musical film. The proposed method exhibits a 22.6% reduction in total error rate compared to the conventional one utilizing the short time energy.
    Sophia Symposium on Modern Mathematics and Its Application to Modern Technology 155-159 2007年  
  • 細川亜希子, 進藤美津子, 平井沢子, 荒井隆行
    音声言語医学 48(1) 39 2007年1月  
  • 上羽 貞行, 荒井 隆行, 栗栖 清浩, 倉片 憲治, 坂本 真一, 船場 ひさお, 佐藤 洋
    日本音響学会誌 63(12) 723-730 2007年  
  • Takayuki Arai, Yoshiaki Murakami, Nahoko Hayashi, Nao Hodoshima, Kiyohiro Kurisu
    Acoustical Science and Technology 28(6) 438-441 2007年  
    Researchers had investigated the correlation between the intelligibility of speech in reverberation and the amount of overlap-masking (OLM) due to reverberation. A high correlation existed between the results of a perceptual experiment and the values of the proposed intelligibility measure and SOR, which is defined as the signal-to-OLM ratio. The intelligibility of speech in reverberation was inversely correlated with the amount of overlap-masking. During the steady-state suppression technique, overlap-masking is reduced by estimating and suppressing steady-state portions of speech that have high energy but are less important for speech perception such as the nuclei of syllables. The advantages of using the proposed measures are that it reflects the reverberation characteristics of a room, as contained in the impulse response of the room, and it also reflects the characteristics of the speech signal itself and the effect of any pre-processes.
  • Takayuki Arai, Yuki Nakata, Nao Hodoshima, Kiyohiro Kurisu
    Acoustical Science and Technology 28(4) 282-285 2007年  
    The effects of steady state suppression after slowing the speaking rate of a speech signal was investigated. It was observed that slowing the speech rate improves speech intelligibility in a reverberant environment. It was also observed that speaking slowly helps to increase speech intelligibility, particularly in a large hall with a long reverberation time. Slowing speech by isolating each syllable would be more effective for improving speech intelligibility. Due to increase in overlap masking it is difficult to understand the speech clearly. Steady state suppression is proposed to reduced the overlap masking as a preprocess for speech signals in reverberant environment. Artificial reverberant environments were achieved by convolving speech samples with impulse response. Pair wise comparison also showed significant improvements by steady state suppression.
  • Kanae Amino, Takayuki Arai
    Acoustical Science and Technology 28(2) 128-130 2007年  
    In this study, we conducted a perceptual speaker identification experiment in order to examine the effects of speaker-listener familiarity and of the stimulus content. We used the same materials as those used in our previous study [6], where familiar listeners identified the speakers. The results showed that familiar listeners performed significantly better than naive listeners however, the overall effects of the stimulus content were similar between familiar and naive listeners. The nasals /na/ and /nja/ were particularly effective for speaker identification, and the identification score differences among the coronal nasals and the labial nasal was again observed in this study. © 2007 The Acoustical Society of Japan.
  • Nao Hodoshima, Yusuke Miyauchi, Keiichi Yasu, Takayuki Arai
    Acoustical Science and Technology 28(1) 53-55 2007年  
    The effect of steady-state suppression on speech intelligibility for an elderly person under various reverberation conditions was studied. Processed and unprocessed speech materials were reproduced using three reverberant conditions, reverberation time (RTs) of 0.7, 1.0 and 1.2s represented by an impulse response measured in Hamming Hall in Tokyo. The computer-controlled listening test was conducted in a sound-treated room and the sound level was adjusted to a comfortable level for the participant before the beginning of trials. The degree of improvement in perception produced by steady-state suppression for each reverberant condition was different from the elderly participant and young normal hearing participants in each reverberant condition. With RTs of 0.7 and 1.0 s, the participant achieved higher scores for steady-state suppressed signals than for unprocessed signals.
  • Takayuki Arai, Natasha Warner, Steven Greenberg
    Acoustical Science and Technology 28(1) 46-48 2007年  
    An analysis of pronunciation variations of Japanese component of the Oregon Graduate Institute Multi-Language Telephone Speech (OGI-TS) Corpus is presented. These variations include reduction or deletion, and frequencies of occurrence and duration of both vowels and consonants in corpus. This corpus contains 90 calls and each call was uttered by a unique adult speaker. Filled pauses, hesitations and other instances of interruption in the speech stream were also transcribed. The non-high vowel devoicing is common in this corpus than would be anticipated on the basis of the published literature. In Japanese, the main difference between careful and spontaneous speech is in the proportion of vowel devoicing and deletion. The variations in pronunciation of consonants in Japanese includes glottal fricative, nasalization of vowels before nasals, and other forms of consonant reduction.
  • Takayuki Arai
    Acoustical Science and Technology 28(3) 190-201 2007年  
    In this paper, we present and discuss an educational system in the fields of acoustics and speech science using a series of physical models of the human vocal tract. Because education in acoustics is relevant for several fields related to speech communication, it hosts students from a variety of educational backgrounds. Moreover, we believe that an education in acoustics is important for students of different ages: college, high school, middle school, and even elementary school students. Because of the varied student populations, we develop an educational system that instructs students intuitively and effectively and consists of the following models: lung models, an artificial larynx, Arai's models (cylinder and plate type models), Umeda and Teranishi's model (a variable-shape model), and head-shaped models. These models effectively demonstrate several principal aspects of speech production, such as phonation, source-filter theory, the relationship between vocal-tract shape/ tongue movement and vowel quality, and nasalization of vowels. We have confirmed that combining the models in an effective way produces complete education in the acoustics of speech production. The examinations and questionnaire surveys conducted before and after using our proposed system revealed that the learners' understanding of what improves with the use of the system. The system is also effective for voice and articulatory training in speech pathology and language learning. © 2007 The Acoustical Society of Japan.
  • 岩崎純二, 片岡竜太, 山下洋介, 春日梨恵, 安啓一, 荒井隆行, 新谷悟
    電子情報通信学会技術研究報告. SP, 音声 106(443) 49-54 2006年12月  
    健常者4名(男性2名と女性2名)の/impee/発音時の4次元MRI撮像を行い,3つの鼻咽腔閉鎖パターンを認めた.軟口蓋,鼻咽腔開存部および軟口蓋の挙上に関連する口蓋帆挙筋(LVP筋)の安静時及び母音/i/と子音/p/発音時の位置と形状を,正中矢状断面と水平断面で観察した.その結果,安静時にLVP筋は楕円形状をしており,その長軸は前後方向を向いていたが,軟口蓋が最大挙上すると鼻咽腔開存部に向かって回転した(タイプA)また軟口蓋の中等度の挙上ではLVP筋の長軸は近心に平行移動した(タイプB).(1)Coronal patternでは/i/と//p/発音時ともにタイプA,(2)Circular patternでは/p/発音時でタイプA,/i/発音時にタイプB,(3)Circular with Passavant's ridge patternでは/i/と/p/発音時,共にタイプBの運動がみられた.
  • Kanako Ueno, Takayuki Arai, Fumiaki Satoh, Akira Nishimura, Koichi Yoshihisa
    J. Acoust. Soc. Am 120(5, Pt.2) 3116 2006年11月  
  • 小松雅彦, 荒井隆行
    日本音声学会全国大会予稿集 195-200 2006年9月  
  • 網野加苗, 荒井隆行
    日本音響学会研究発表会講演論文集 273-274 2006年9月  
  • 向奈津美, 北口直, 金寺登, 荒井隆行, 藤樫佑樹, 古賀綾子, 吉井順子, 船田哲男
    日本音響学会研究発表会講演論文集 263-264 2006年9月  
  • 古賀綾子, 藤樫佑樹, 荒井隆行, 金寺登, 吉井順子
    日本音響学会研究発表会講演論文集 261-262 2006年9月  
  • 加島慎平, 飯田朱美, 安啓一, 荒井隆行, 菅原勉
    日本音響学会研究発表会講演論文集 251-252 2006年9月  
  • 荒井 隆行, K. Ohta, K. Yasu
    DSPS教育者会議予稿集, 2006 55-58 2006年9月  
  • Hodoshima N, Arai T, Kusumoto A, Kinoshita K
    The Journal of the Acoustical Society of America 119(6) 4055-4064 2006年6月  査読有り
  • Nao Hodoshima, Takayuki Arai, Akiko Kusumoto, Keisuke Kinoshita
    J. Acoust. Soc. Am 119(6) 4055-4064 2006年6月  
  • 村上善昭, 程島奈緒, 中田有貴, 林奈帆子, 宮内裕介, 荒井隆行, 栗栖清浩
    日本音響学会研究発表会講演論文集 649-650 2006年3月  
  • 安啓一, 小林敬, 荒井隆行, 八田ゆかり, 南畑伸至, 進藤美津子
    日本音響学会研究発表会講演論文集 487-488 2006年3月  
  • 網野加苗, 菅原勉, 荒井隆行
    日本音響学会研究発表会講演論文集 363-364 2006年3月  
  • 平井沢子, 安啓一, 荒井隆行, 飯高京子
    電子情報通信学会技術研究報告. SP, 音声 105(686) 17-22 2006年3月  
  • 安啓一, 荒井隆行, 進藤美津子
    電子情報通信学会技術研究報告. SP, 音声 105(686) 1-4 2006年3月  
  • 平井沢子, 安啓一, 荒井隆行, 飯高京子
    音声言語医学 47(1) 75-75 2006年  
  • Nao Hodoshima, Dawn Behne, Takayuki Arai
    This study investigated whether the steady-state suppression method proposed by Arai et al. (2001, 2002) improved consonant identification for nonnative listeners in reverberation. It also compared the effect of steady-state suppression on consonant identification by native and nonnative listeners in reverberation. We used steady-state suppression as a preprocessing technique which processes speech signals before they are radiated from loudspeakers in order to reduce the amount of overlap-masking. Participants were 24 native English (native listeners) and 24 Japanese speakers (nonnative listeners), both with normal hearing. A diotic Modified Rhyme Test was conducted with and without steady-state suppression for reverberation times of 0.4, 0.7 and 1.1 s and a non-reverberant condition. The results showed that native listeners performed better than nonnative listeners, and that the mean percentage of correct answers in initial consonants was higher than in final consonants. The results also showed that processed and unprocessed speech was comparable for word initial and final consonants. These findings indicate that parameters of steady-state suppression would need adjustment to accommodate speech materials and reverberant conditions. They also suggest that the difficulties that nonnative listeners have might not be due to the actual acoustic-phonetic information from the signal.
  • ARAI TAKAYUKI, K. Amino, T. Sugawara
    Proc. of the Western Pacific Acoustics Conference (WESPAC) 2006年  
  • ARAI TAKAYUKI, N. Hodoshima
    Proc. of the Western Pacific Acoustics Conference (WESPAC) 2006年  
  • Takayuki Arai, Fumiaki Satoh, Akira Nishimura, Kanako Ueno, Koichi Yoshihisa
    Acoustical Science and Technology 27(6) 344-348 2006年  
    Many demonstrations for education in acoustics have been developed in Japan as well as outside the country. Since 1997, the Technical Committee on Education in Acoustics of the Acoustical Society of Japan has been investigating and discussing education in acoustics in Japan. In this review, some of the educational tools and demonstrations in acoustics are introduced. They are all designed to help us visualize and hear different phenomena and to understand abstract theories in a more intuitive way. The work that has been carried out includes some exciting demonstrations in acoustics by the high-school physics teachers' "Stray Cats Group," some visual and aural demonstrations for architectural acoustics, a technical course called "Technical Listening Training," a WWW-based training system, and physical models of the human vocal tract.
  • Kaoru Ashihara, Akira Nishimura, Takayuki Arai
    Acoustical Science and Technology 27(6) 317-317 2006年  
  • Takayuki Arai, Keiichi Yasu, Takahito Goto
    Acoustical Science and Technology 27(6) 393-395 2006年  
    A modern pattern playback, which can serve to be very useful in pedagogical applications, was implemented. Two simple algorithms were proposed for digital pattern playback that included the AM method, based on the concept of amplitude modulation (AM), and the FFT method, based on the concept of fast Fourier transform (FFT). A simple system with the FFT method was implemented for real-time processing systems, by capturing a time slice of the input spectrogram in each frame, computing the waveform of the corresponding glottal cycle at that time frame by the inverse FFT, and by producing an acoustic signal based on the glottal waveform. This real-time aspect is very important in a pedagogical situation as the combination of the simultaneous sensations of tactility, somatosensory and auditory perception helps learners to understand the phenomenon more naturally, easily and intuitively.
  • Takayuki Arai
    Acoustical Science and Technology 27(6) 384-388 2006年  
    A sliding three-tube (S3T) model as an implementation of a physical model which varies the constriction position in a three-tube resonator, has been proposed. This three-tube model uses a simple mechanism to produce several different vowels. This model is an idealized system of coupled resonators and can be viewed as a tube having a uniform area function with a single narrow constriction. The S3T model consists of two parts, the outer and inner cylinders. The outer cylinder is a uniform tube with a constant diameter, while the inner cylinder has much shorter length and diameter. The S3T model is highly suitable for hands-on activities in acoustics education. The sound source for this model can be varied such as an electrolarynx, another type of artificial larynx such as whistle type, or driver unit of a horn speaker. This model can be used for many activities from science workshops for children to demonstrations of quantal theory for graduate students.
  • Takayuki Arai
    Acoustical Science and Technology 27(5) 298-301 2006年  
    The acoustic cue parsing between nasality and breathiness in speech perception by the listener was investigated. The main effect of the nasalization is the perturbation of low-frequency spectrum. Nasalty and breathiness have several acoustic cues that are strongly correlated to each other. Acoustic parameters are correlated with subjective judgments of breathiness including the degree of aspiration noise intruding in frequencies above 1.5 kHz in vowels, and the relative strength of the fundamental component. The resulting acoustic signal might contain these cues when a vowel sound with a lowered velum or a wide-spread glottis is produced. The perceptual experiments resulted that a listener parses these cues for nasality and breathiness. Some of the tendencies include the perceived nasality increases as the spacing of the nasal nose and zero becomes wider, perceived breathiness increases as aspiration increases, and with strong aspiration, breathiness is higher as open quotient increases.
  • Kanae Amino, Tsutomu Sugawara, Takayuki Arai
    Acoustical Science and Technology 27(4) 233-235 2006年  
    A human speaker identification test was conducted to find out the differences in the effectiveness of using various Japanese sounds in identifying the speakers. The stimuli used in the experiment was also analyzed in order to explain these differences in terms of acoustical distances. Sounds were evaluated in terms of the spectral distances in order to explain the differences among the stimuli in the perception test acoustically. It was reported that the recognition rate was improved by considering the individualities of the speakers in the oro-nasal coupling and by using the weighted linear scale spectral properties. The result suggest that the individualities of the speakers were reflected more in the spectra of the nasal sounds than in those of the oral sounds and that the listeners perceived these individualities when they identify the speakers.
  • Takayuki Arai
    Acoustical Science and Technology 27(2) 111-113 2006年  
    The lung model and head-shaped models with a visible vocal tract are described as effective educational tools in acoustics. The lungs model with whistle type artificial larynx with the use of bellows describes the human respiratory system. Head shaped model visualizes the position of the vocal tract in the head describing the modal phonation. The content is helpful to understand speech production, musicology, speech pathology, and language learning. The description on Umeda and Teranishi's model, Arai's cylinder type and plate model, fixed model, and manipulative tongue model, and the head shaped model with nasal cavities support the experiment giving a clear view on the speech production, speech mechanism, and voice modulation.
  • 大木衣里子, 原恵子, 飯高京子, 進藤美津子, 荒井隆行
    コミュニケーション障害学 22(3) 204 2005年12月  
  • 荒井 隆行, 後藤崇公, 安啓一
    第7回DSPS教育者会議予稿集 91-94 2005年9月  
  • 荒井 隆行, 竹内京子
    日本音声学会全国大会予稿集 179-184 2005年9月  
  • 荒井 隆行
    日本音声学会全国大会予稿集 3 2005年9月  
  • 中田有貴, 村上善昭, 林奈帆子, 宮内裕介, 程島奈緒, 荒井隆行, 栗栖清浩
    日本音響学会研究発表会講演論文集 693-694 2005年9月  
  • 程島奈緒, 荒井隆行
    日本音響学会研究発表会講演論文集 607-608 2005年9月  
  • 安啓一, 荒井隆行, 小林敬, 進藤美津子
    日本音響学会研究発表会講演論文集 517-518 2005年9月  
  • 網野加苗, 菅原勉, 荒井隆行
    日本音響学会研究発表会講演論文集 431-432 2005年9月  
  • 荒井隆行, 安啓一, 後藤崇公
    日本音響学会研究発表会講演論文集 429-430 2005年9月  
  • 藤樫佑樹, 古賀綾子, 荒井隆行, 金寺登, 吉井順子
    日本音響学会研究発表会講演論文集 33-34 2005年9月  














