研究者業績

荒井 隆行

アライ タカユキ  (Arai Takayuki)

基本情報

所属
上智大学 理工学部情報理工学科 教授
学位
工学士(上智大学)
工学修士(上智大学)
博士(工学)(上智大学)

連絡先
araisophia.ac.jp
研究者番号
80266072
J-GLOBAL ID
200901064275514612
researchmap会員ID
1000260131

<国内>
2008年4月  上智大学理工学部情報理工学科 教授(現在に至る)
2006年4月  上智大学理工学部電気・電子工学科 教授
2000年4月  上智大学理工学部電気・電子工学科 助教授
1998年4月  上智大学理工学部電気・電子工学科 専任講師
1994年4月  上智大学理工学部電気・電子工学科 助手
1994年3月  上智大学大学院理工学研究科電気・電子工学専攻博士後期課程 修了
1991年3月  上智大学大学院理工学研究科電気・電子工学専攻博士前期課程 修了
1989年3月  上智大学理工学部電気・電子工学科 卒業

<国外>
2003年10月~2004年9月  アメリカMassachusetts Institute of Technology客員研究員
2000年8月、2001年8月、2002年8月、ならびに 2003年10月~2004年9月
       アメリカ Massachusetts Institute of Technology 客員研究員                 
2001年2月  オランダ Max Planck Institute for Psycholinguistics 客員研究員
2000年8月  アメリカ Massachusetts Institute of Technology 客員研究員
1997年1月~1998年3月 / 1998年8月ならびに1999年8月
       アメリカ California 大学 Berkeley 校付属研究機関
        International Computer Science Institute 客員研究員
1992年9月~1993年8月ならびに1995年6月~1996年12月
        アメリカ Oregon Graduate Institute of Science and Technology 客員研究員

音声コミュニケーションに関わる一連の事象は「ことばの鎖(Speech Chain)」と呼ばれ、音声科学・聴覚科学における基本的な概念となっており、その音声コミュニケーションに関して音声科学・聴覚科学、音響学、音響音声学などに関わる科学的側面とその応用に主な焦点を当てて研究を続けてきている。そして、音に関わるあらゆる側面にも研究の範囲を拡大している。カバーする範囲は、次のような幅の広い学際的な研究分野を含む:
・音響学と音響教育(例:声道模型)
・音響音声学を中心とする言語学分野(音声学・音韻論)とその教育応用(応用言語)
・音声生成を含む音声科学と音声知覚を含む聴覚科学、音や音声を含む認知科学
・実環境での音声知覚・音声明瞭度、音声信号処理・音声強調
・音声に関する福祉工学・障害者支援、障害音声の音響分析や聴覚障害者・高齢者の音声生成や音声知覚
・実時間信号処理を含む音声処理アルゴリズムの開発、音に関わるシステムやアプリの開発
・音声の話者性
・その他、音に関する研究全般など

(研究テーマ)
音響学と音響教育(声道模型を含む)
音響音声学を中心とする言語学分野(音声学・音韻論)とその教育応用(応用言語)
音声生成を含む音声科学と音声知覚を含む聴覚科学、音や音声を含む認知科学
実環境での音声知覚・音声明瞭度、音声信号処理・音声強調
音声に関する福祉工学・障害者支援、障害音声の音響分析や聴覚障害者・高齢者の音声生成や音声知覚
実時間信号処理を含む音声処理アルゴリズムの開発、音に関わるシステムやアプリの開発
音声の話者性

(共同・受託研究希望テーマ)
音情報処理
音声言語情報処理
聴覚情報処理


論文

 608
  • 荒井隆行
    日本コミュニケーション障害学会学術講演会予稿集 30-35 2013年7月  招待有り
  • 辻美咲, 荒井隆行, 安啓一
    日本音響学会誌 69(4) 179-183 2013年4月  
    残響時間が長い環境では,情報伝達が困難となることがある。そのため,残響による影響を軽減させ,正しく情報伝達することが求められている。本研究は,残響下での聴き取りを改善するため,子音強調及び母音抑圧を施す前処理手法を提案する。処理音声に残響を畳み込んだ刺激を用いて単語了解度試験を行い,処理の効果を検討した。その結果,各子音部の最大振幅を1.0とし,各母音部の最大振幅を0.4〜1.0とした場合に関しては有意差はなかった。一方,各母音部の最大振幅を0.2とした場合に関しては有意に了解度が低下した。
  • 荒井隆行
    日本音響学会春季研究発表会講演論文集 1507-1510 2013年3月  招待有り
  • 中村進, 栗栖清浩, 安啓一, 荒井隆行
    日本音響学会春季研究発表会講演論文集 1231-1232 2013年3月  
  • 栗栖清浩, 中村進, 安啓一, 荒井隆行
    日本音響学会春季研究発表会講演論文集 1229-1230 2013年3月  
  • 渡丸嘉菜子, 荒井隆行
    日本音響学会春季研究発表会講演論文集 553-556 2013年3月  
  • 鮮于媚, 加藤宏明, 田嶋圭一, 荒井隆行
    日本音響学会春季研究発表会講演論文集 549-552 2013年3月  
  • 井下田貴子, 鮮于媚, 荒井隆行
    日本音響学会春季研究発表会講演論文集 441-444 2013年3月  
  • 柳澤絵美, 荒井隆行
    日本音響学会春季研究発表会講演論文集 413-414 2013年3月  
  • 荒井隆行
    日本音響学会春季研究発表会講演論文集 349-352 2013年3月  
  • Keiichi Yasul, Takayuki Arai, Kei Kobayashi, Mitsuko Shindo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 3151-3155 2013年  
    In previous studies, we conducted several experiments, including identification tests for young and elderly listeners using /shi/ /chi/ (CV) and /ishi/ /ichi/ (VCV) continua. For the CV stimuli, confusion of /shi/ as /chi/ increased when the frication had a long rise time, and /chi/ was confused with /shi/ when the frication had a short rise time. This was true for the group with the following auditory property degradation: 1) elevation of absolute threshold, 2) presence of loudness recruitment, and 3) deficit of auditory temporal resolution. When auditory property degradation was observed, the weighting of acoustic cues shifted to frication duration rather than the gradient of the amplitude of frication. The latter was calculated by dividing frication amplitude by rise time. In the VCV stimuli, confusion of /ichi/ as /ishi/ occurred for a long silent interval between the first V and C with auditory property degradation, and the weighting of acoustic cues shifted from the silent interval to frication duration. In the present study, we unified these findings into a single framework and found that degradation of auditory properties causes listeners to prefer duration of frication as a cue for identifying fricatives and affricates.
  • Takayuki Arai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 2444-2448 2013年  
    Many studies have pointed out that the /r/ sounds in Japanese tend to be difficult for native children of Japanese to acquire. To verify this, we first investigated Japanese In sounds uttered by two -year -old twins as a case study. The acoustic analysis of the recordings, which included several words with various /r/ sounds, revealed that certain /r/ sounds are difficult to produce and are often produced with speech errors. We also analyzed a set of utterances of Japanese /r/ spoken in a variety of phones pronounced by an adult male speaker. Then, for comparison, we synthesized Japanese /r/ sounds using four parameters. We conducted two perceptual experiments: one for the natural speech by the male speaker of Japanese, and another for the synthesized speech sounds based on the four parameters. The results showed that variation in pronunciation in adults was widely distributed. We discussed the reasons that it takes time for children to acquire /r/ sounds, and we concluded that it is possibly due to the combination of two factors: 1) some /r/ sounds themselves are difficult to produce, and 2) there is a wide distribution of pronunciation variation in adult speakers.
  • Takayuki Arai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 2018-2022 2013年  
    Certain sounds are difficult for children to produce, even if the sounds are in their native language. For example, Japanese In can be difficult for Japanese children to learn. Second language learners can also have difficulty acquiring certain sounds. For example, Japanese speakers learning English often have difficulty with English /r/ and /1/. To address this problem, we have developed two new physical models of the vocal tract: one for flap sounds (Model A) and another for liquid sounds (Model B). Each of them has a flapping tongue, and for Model B, the length of the tongue is variable. When the tongue is short, we can produce alveolar/retroflex approximants, and when the tongue is long we can produce lateral approximants. We recorded several sets of sounds produced by these models, analyzed the speech data, and used them for perceptual experiments. From the acoustic analysis and the perceptual experiments, we confirmed that the sounds produced by Model A were heard as Japanese In, and the sounds produced by Model B were heard as English In and /1/. Furthermore, the models are helpful for practicing pronunciation because learners can see the tongue, alter tongue position manually, and hear the output sounds.
  • Marino Kasuya, Takayuki Arai
    Proceedings of Meetings on Acoustics 19(060290) 1-8 2013年  
    This study investigates an aspect of speech rhythm in German spoken by Japanese native speakers of different proficiency levels. Previous studies on the production of vowel reduction have indicated that this is an area of difficulty for non-native speakers. One study, working on the assumption that second language (L2) speech production is affected by first language (L1), suggested that Japanese native speakers tend to fail at producing the required vowel reductions in unstressed syllables. The present study further investigated this issue by dividing Japanese native speakers into two groups: advanced and elementary learners. The aim of the present study was to investigate acoustic properties of vowel quality (first and second formants) and quantity (durational ratio) of unstressed syllables in German suffixes on the basis of German proficiency levels and the influence of L1. From results, main effect was obtained for the proficiency levels acoustic analysis showed significant differences between first two formants and durational ratio same to be the factors that caused the difference among the levels. This suggests that L2 learning process may accompany the acquisition of L2 sounds even when rhythmic structures differ between L1 and L2. © 2013 Acoustical Society of America.
  • Takayuki Arai
    Proceedings of Meetings on Acoustics 19(025017) 1-9 2013年  招待有り
    There is a huge volume of written textbooks available in virtually every modern field, including acoustic phonetics. However, in areas dealing with acoustics, learners often face problems and limitations when they deal with only written material and no audio or visual information. As one response to this problem, we have developed several sets of physical models of the human vocal tract and have shown that they are extremely useful for intuitive understanding. In addition, we also developed a tool called "Digital Pattern Playback." Another solution is an online version featuring demonstrations. We are currently collecting materials, mainly in the form of sounds, for educational purposes in acoustics and phonetics and are releasing them as "Acoustic-Phonetics Demonstrations" through our Web site. These demonstrations are designed for students in linguistics, phonetics and phonology, speech pathology, audiology, psychoacoustics, speech engineering, and others. However, potential users are not limited to these groups, as we feel that a wide range of learners can obtain tremendous benefits from the demonstrations, including those who are studying foreign languages or patients undergoing speech articulation therapy. © 2013 Acoustical Society of America.
  • Takayuki Arai
    Proceedings of Meetings on Acoustics 19(025012) 1-8 2013年  
    In our previous work, we developed several physical models of the human vocal tract and reported that they are intuitive and helpful for students studying acoustics and speech science. Models with a bent vocal tract can achieve relatively realistic tongue movements. These bent-type models had either a flexible tongue or a sliding tongue. In the former case, the tongue was made of a flexible gel-type material so that we could form arbitrary tongue shapes. However, this flexibility meant that training is needed to achieve target sounds. In the latter case, the tongue was made of an acrylic resin, and only a limited number of vowel sounds can be achieved because so few sliding parts are available to change the tongue shape. Therefore, in this study, we redesigned the mechanical bent-type models so that they now consist of blocks. By placing the blocks at the proper positions, the block-type model can produce five intelligible Japanese vowels. We also designed a single bent-type model with sliding blocks that can produce several vowel sounds. © 2013 Acoustical Society of America.
  • Kiyohiro Kurisu, Susumu Nakamura, Keiichi Yasu, Takayuki Arai
    Acoustical Science and Technology 34(5) 354-355 2013年  
    Speech Transmission Index (STI) is an index of speech clarity and is widely utilized for an evaluation scale of public address systems. The numerical value of the STI is, however, mostly dominated and determined by the number and the characteristics of loudspeakers as well as the physical conditions of the room's interior surfaces. SOR method utilizes a speech pattern as a test signal and measures the power ratio concerning the speech frequency band. On the contrary, C50 is not always specialized for the speech evaluation because it comes from an impulse response containing the characteristics of the needless frequency band. On the contrary, SOR is calculated from an actual recorded speech pattern at a listening point which contains all the signal modifications within the pathware.
  • Takayuki Arai
    Acoustical Science and Technology 34(2) 142-146 2013年  
    We first compared a speech signal with two reverberations, normal reverberation and its time-reversed version, that have the same modulation transfer function. Results showed that intelligibility of speech with the time-reversed reverberation was significantly less than that with the normal reverberation. We then compared the results of human speech recognition (HSR) with those of automatic speech recognition (ASR) to see whether a similar tendency could be observed in both cases. Results showed the similar asymmetry in ASR, but we found the HSR was more tolerant even though reverberation becomes longer. Finally, we discussed factors of asymmetric temporal properties in speech production and perception that current speech recognizers do not have. © 2013 The Acoustical Society of Japan.
  • TOMARU Kanako, ARAI Takayuki
    聴覚研究会資料 = Proceedings of the auditory research meeting 42(7, H2012-107) 591-596 2012年10月  
  • 安啓一, 荒井隆行, 小林敬, 進藤美津子
    日本音響学会誌 68(10) 501-512 2012年10月  
    高齢者を対象に,子音持続時間もしくは子音と先行母音間の無音区間長を主に変化させた「シ」-「チ」(CV),「イシ」-「イチ」(VCV)連続体の識別実験を行った。その結果,若年者同様,CVでは子音が長くなると「チ」→「シ」,VCVでは無音区間が長くなると「イシ」→「イチ」に聞こえが変化した。また,1)摩擦音閾値の上昇,2)補充現象の存在,3)時間分解能の低下に着目すると,1)の場合,CVでは子音が短いと「チ」→「シ」,長いと「シ」→「チ」の異聴が現れ,更に2)を伴うと傾向が強まった。VCVでは,1)-3)すべてが存在する場合,「イチ」→「イシ」の異聴が最も顕著だった。
  • 荒井隆行
    日本音響学会秋季研究発表会講演論文集 1651-1654 2012年9月  
  • 栗栖清浩, 中村進, 安啓一, 荒井隆行
    日本音響学会秋季研究発表会講演論文集 1251-1252 2012年9月  
  • 岩波達也, 荒井隆行, 安啓一
    日本音響学会秋季研究発表会講演論文集 1247-1250 2012年9月  
  • 三戸武大, 荒井隆行, 安啓一
    日本音響学会秋季研究発表会講演論文集 1243-1246 2012年9月  
  • 川島佑亮, 荒井隆行, 安啓一
    日本音響学会秋季研究発表会講演論文集 831-834 2012年9月  
  • 安啓一, 荒井隆行, 小林敬, 進藤美津子
    日本音響学会秋季研究発表会講演論文集 503-506 2012年9月  
  • 渡丸嘉菜子, 荒井隆行
    日本音響学会秋季研究発表会講演論文集 445-448 2012年9月  
  • 井下田貴子, 鮮于媚, 荒井隆行
    日本音響学会秋季研究発表会講演論文集 369-372 2012年9月  
  • 粕谷麻里乃, 荒井隆行
    日本音響学会秋季研究発表会講演論文集 365-368 2012年9月  
  • H. Masuda, T. Arai
    Proc. Autumn Meet. Acoust. Soc. Jpn. 361-364 2012年9月  
  • K. Tomaru, T. Arai
    Proc. of the General Meeting of the Phonetic Society of Japan 79-84 2012年9月  
  • N. Hodoshima, T. Arai, K. Kurisu
    Proc. of INTERSPEECH 1464-1467 2012年9月  
  • T. Arai
    Proc. of INTERSPEECH 2190-2193 2012年9月  
  • N. Hodoshima, T. Arai
    Journal of Acoustical Society of America 131(4) 3316-3316 2012年4月  
  • 久野マリ子, 荒井隆行
    日本音響学会春季研究発表会講演論文集 1541-1544 2012年3月  招待有り
  • 荒井隆行
    日本音響学会春季研究発表会講演論文集 1467-1470 2012年3月  
  • 中村進, 栗栖清浩, 安啓一, 荒井隆行
    日本音響学会春季研究発表会講演論文集 1227-1228 2012年3月  
  • 栗栖清浩, 中村進, 安啓一, 荒井隆行
    日本音響学会春季研究発表会講演論文集 1225-1226 2012年3月  
  • 渡丸嘉菜子, 荒井隆行
    日本音響学会春季研究発表会講演論文集 519-522 2012年3月  
  • 井下田貴子, 鮮于媚, 荒井隆行
    日本音響学会春季研究発表会講演論文集 487-490 2012年3月  
  • H. Masuda, T. Arai
    Proc. Spring Meet. Acoust. Soc. Jpn. 477-480 2012年3月  
  • ペクキムホーチ, 荒井隆行, 金寺登
    日本音響学会春季研究発表会講演論文集 141-144 2012年3月  
  • Takayuki Arai
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 131(3) 2444-2454 2012年3月  
    Several vocal-tract models were reviewed, with special focus given to the sliding vocal-tract model [T. Arai, Acoust. Sci. Technol. 27(6), 384-388 (2006)]. All of the models have been shown to be excellent tools for teaching acoustics and speech science to elementary through university level students. The sliding three-tube model is based on Fant's three-tube model [G. Fant, Acoustic Theory of Speech Production (Mouton, The Hague, The Netherlands, 2006)] and consists of a long tube with a slider simulating tongue constriction. In this article, the design of the sliding vocal-tract model was reviewed. Then a science workshop was discussed where children were asked to make their own sliding vocal-tract models using simple materials. It was also discussed how the sliding vocal-tract model compares to our other vocal-tract models, emphasizing how the model can be used to instruct students at higher levels, such as undergraduate and graduate education in acoustics and speech science. Through this discussion the vocal-tract models were shown to be a powerful tool for education in acoustics and speech science for all ages of students. (C) 2012 Acoustical Society of America. [DOI: 10.1121/1.3677245]
  • Nao Hodoshima, Takayuki Arai, Kiyohiro Kurisu
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 1462-1465 2012年  査読有り
    Speech intelligibility is in general lower for older adults than young adults in reverberant environments such as train stations or airports. We aim at to make speech announcements intelligible in public spaces. Speech spoken in noise, i.e., noise-induced speech, is usually more intelligible than speech spoken in a quiet environment for young people when they are heard in noise, a phenomenon called the Lombard effect. The current study applied this effect for an input of a sound reinforcement system in public spaces. The results of the listening tests conducted by 24 older adults showed that noise/reverberation-induced speech was more intelligible than speech spoken in a quiet environment when they were in reverberant environments (reverberation time of 1.4 s and 2.4 s). The results also showed that the effect of noise/reverberation-induced speech was observed when the recording and listening condition were different. For example, different reverberation times were used between the two conditions and noise-induced speech was intelligible in reverberation. The results suggest that using noise/reverberation-induced speech as an input of a sound reinforcement system might yields higher intelligibility in public spaces.
  • Takayuki Arai, Nao Hodoshima, Keiichi Yasu
    IEEE Transactions on Audio, Speech & Language Processing 20(2) 709-709 2012年  査読有り
  • Takayuki Arai, Kanae Amino, Mee Sonu, Keiichi Yasu, Takako Igeta, Kanako Tomaru, Marino Kasuya
    Proc. of Workshop on Child, Computer and Interaction 104-107 2012年  
  • Takayuki Arai
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 2769-2772 2012年  
    We developed a digital version of Pattern Playback to convert a spectrographic representation of speech back into a speech signal. Pattern Playback was originally developed by Cooper and his colleagues from Haskins Laboratories in the late 1940s. We used our Digital Pattern Playback (DPP) for instruction in digital signal processing and speech science. The original DPP used two different algorithms: amplitude modulation and fast Fourier transform. The new DPP uses additive synthesis of sinusoidal harmonics, which is easier for undergraduate college students to understand. We also designed a scientific exhibition with DPP at a science museum for children and adults. DPP is educational for a wide variety of people, from children to technical students.
  • Yohei Matsukaze, Takayuki Arai, Toshimasa Suzuki, Keiichi Yasu
    Acoustical Science and Technology 33(6) 370-371 2012年  
    A study was conducted to report a preprocessing technique to improve speech intelligibility in reverberation. zeros were padded in the center of each steady-state portion and the intelligibility of processed speech in reverberation was examined. Researchers also investigated whether the processed speech was still natural, as the zero-padding technique reduced the effects of overlap masking with long zeros, along with the naturalness as zeros became longer. The speech samples used in this study were based on the 14 Japanese monosyllables and each target syllable was inserted in the career sentence. The experiment was conducted in a sound-treated room and its stimuli were presented through headphones. The 14 syllables were visually presented on the computer display after each stimulus was presented, and listeners were to select what they heard. The analysis of variance (ANOVA) was carried out using statistical analysis software, SPSS, for the evaluation data of unnaturalness.
  • T. Arai
    the Phonetician 2012-I-II(104-105) 39-50 2012年  
  • Kimhuoch Pek, Takayuki Arai, Noboru Kanedera
    Acoustical Science and Technology 33(1) 33-44 2012年  
    Voice activity detection (VAD) in noisy environments is a very important preprocessing scheme in speech communication technology, a field which includes speech recognition, speech coding, speech enhancement and captioning video contents. We have developed a VAD method for noisy environments based on the modulation spectrum. In Experiment 1, we investigate the optimal ranges of speech and modulation frequencies for the proposed algorithm by using the simulated data in the CENSREC-1-C corpus. Results show that when we combine an upper limit frequency between 1,000 and 2,000 Hz with a lower limit frequency of less than 300 Hz as speech frequency bands, error rates are lower than with other bands. Furthermore, when we use the frequency components of the modulation spectrum between 3-9, 3-11, 3-14, 3-18, 4-9, 4-11, 4-14, 4-18, 5-7, 5-9, 5-11, or 5- 14 Hz, the proposed method performs VAD well. In Experiment 2, we use one of the best parameter settings from Experiment 1 and evaluate the real environment data in the CENSREC-1-C corpus by comparing our method with other conventional methods. Improvements were observed from the VAD results for each SNR condition and noise type. © 2012 The Acoustical Society of Japan.

MISC

 71

講演・口頭発表等

 227

Works(作品等)

 11

共同研究・競争的資金等の研究課題

 36

学術貢献活動

 1

社会貢献活動

 1

その他

 55
  • 2006年4月 - 2008年6月
    英語によるプレゼンテーションを学ぶ講義の中で、自分のプレゼンテーションを客観的に学生に示すため、発表風景をビデオに収め、後で学生にそれを見て自己評価させるようにしている。また、同内容で2回目のプレゼンテーションを行わせ、改善する努力を促す工夫もしている。
  • 2003年 - 2008年6月
    音響教育に関する委員会の委員を務め、教育セッション(例えば2006年12月に行われた日米音響学会ジョイント会議における教育セッション)をオーガナイズするなど。
  • 2003年 - 2008年6月
    音響教育に関する委員会の委員を務め、教育セッション(例えば2004年4月に行われた国際音響学会議における教育セッション)をオーガナイズするなど。特に2005年からは委員長を仰せつかり、精力的に活動している(例えば、2006年10月に国立博物館にて科学教室を開催)。
  • 2002年4月 - 2008年6月
    本学に赴任して以来、「Progress Report」と称して研究室の教育研究活動に関する報告書を作成し発行している。これにより、研究室の学生の意識の向上にも役立ち、効果を発揮している。
  • 2002年4月 - 2008年6月
    普段から英語に慣れておくことが重要であると考え、研究室の定例ミーティングの中で定期的に英語によるミーティングを行っている。また、2006年度からは研究グループごとに行われる毎回の進捗報告も英語で行うことを義務付けている。