Curriculum Vitaes

Arai Takayuki

  (荒井 隆行)

Profile Information

Affiliation
Professor, Faculty of Science and Technology, Department of Information and Communication Sciences, Sophia University
Degree
工学士(上智大学)
工学修士(上智大学)
博士(工学)(上智大学)

Contact information
araisophia.ac.jp
Researcher number
80266072
J-GLOBAL ID
200901064275514612
researchmap Member ID
1000260131

Research and professional experience:

2008-present Professor at the Department of Information and Communication Sciences,
Sophia University
2006-2008 Professor at the Department of Electrical
and Electronics Engineering, Sophia University
2003-2004 Visiting Scientist at the Research Lab. of Electronics,
Massachusetts Institute of Technology (Cambridge, MA, USA)
2000-2006 Associate Professor at the Department of Electrical
and Electronics Engineering, Sophia University
1998-2000 Assistant Professor at the Department of Electrical
and Electronics Engineering, Sophia University
1997-1998 Research Fellow at the International Computer Science Institute
/ University of California at Berkeley
(Berkeley, California, USA)
1995-1996 Visiting Scientist at the Department of Electrical Engineering,
Oregon Graduate Institute of Science and Technology
(Portland, Oregon, USA)
1994-1995 Research Associate at the Department of Electrical and
Electronics Engineering, Sophia University
working with Professor Yoshida
1992-1993 Visiting Scientist at the Department of Computer Science
and Engineering, Oregon Graduate Institute of Science and Technology
(Portland, Oregon, USA)

Short-term Visiting Scientist:

2000, August / 2001, August / 2002, August
Massachusetts Institute of Technology (Cambridge, Massachusetts, USA)
2001, March
Max Planck Institute for Psycholinguistics (Nijmegen, the Netherlands)

The series of events involved in speech communication is called “Speech Chain,” and it is a basic concept in the speech and hearing sciences. We focus on research related to speech communication. The fields of this research are wide-ranging, and our interests include the following interdisciplinary areas:
- education in acoustics (e.g., physical models of human vocal tract),
- acoustic phonetics,
- speech and hearing sciences,
- speech production,
- speech analysis and speech synthesis,
- speech signal processing (e.g., speech enhancement),
- speech / language recognition and spoken language processing,
- speech perception and psychoacoustics,
- acoustics for speech disorders,
- speech processing for hearing impaired,
- speaker characteristics in speech, and
- real-time signal processing using DSP processors.

(Subject of research)
General Acoustics and Education in Acoustics (including vocal-tract models)
Acoustic Phonetics, Applied Linguistics
Speech Science (including speech production), Hearing Science (including speech perception), Cognitive Science
Speech Intelligibility, Speech Processing, Speech Emhancement
Assistive Technology related to Acoustics, Speech and Acoustics for Everybody
Speech Processing, Applications related to Acoustics
Speaker Characteristics of Speech

(Proposed theme of joint or funded research)
acoustic signal processing
speech signal processing
auditory signal processing


Research History

 2

Papers

 602
  • S Sakaguchi, T Arai, Y Murahara
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2 917-920, 2000  
    In this paper we investigate how polarity inversion of speech signals effects human perception, and we apply this technique for data hiding. In most languages, glottal airflow during phonation is uni-directional, causing constant polarity of the speech waveform. On the other hand, the human auditory system cannot discriminate between speech signals with positive and negative polarity. Based on these facts, we developed an algorithm to hide data in speech signals. We assigned one bit to each syllable of speech, and inverted the polarity of the signal at every syllable according to the assigned bit. We performed a test using 20 sentences from the TIMIT corpus to determine both whether a human could distinguish between the original and polarity-inverted signal and whether we could automatically restore the embedded binary data. We found that we were able to successfully hide data and restore it automatically.
  • A Kusumoto, T Arai, T Kitamura, M Takahashi, Y Murahara
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2 853-856, 2000  
    In this paper we report on a meth od for reducing the degradation of speech intelligibility in public halls caused severe reverberation. Hall reverberation makes speech more difficult to understand, particularly for the hearing-impaired. Our method involves processing the speech audio signal between a microphone and a loudspeaker that radiates the speech into the room. As there is a strong correlation between the modulation spectrum and the intelligibility of speech, we filtered the speech in the modulation frequency domain. Using several modulation filters, we conducted perceptual experiments with hearing-impaired subjects and asked their preference in a church. The experiments indicate that enhancing the modulation frequencies between 2 and 8 Hz improves intelligibility in reverberant environments. The four hearing-impaired subjects rated the processed speech easier to hear than the unprocessed speech.
  • KUSUNOTO A, ARAI T, KITAMURA T, TAKAHASHI M, MURAHARA Y
    1999(2) 389-390, Sep, 1999  
  • KANEDERA Noboru, TAKANO Yukiko, ARAI Takayuki, TAKAHASHI Mahoro
    1999(2) 361-362, Sep, 1999  
  • FUCHIWAKI Y, ARAI T, ANAMI S, NAKAJIMA T, MURAHARA Y
    1999(2) 301-302, Sep, 1999  
  • ARAI TAKAYUKI, Rosaria Silipo, Steven Greenberg
    Eurospeech : European Conference on Speech Communication and Technology, 6 2687-2690, Sep, 1999  
  • ARAI TAKAYUKI, Setsuko Imatomi, Yuko Mimura, Masako Kato
    Eurospeech : European Conference on Speech Communication and Technology, 3 1075-1078, Sep, 1999  
  • ARAI TAKAYUKI, K. Mori, N. Toba, T. Harada, M. Komatsu, M. Aoyagi, Y. Murahara
    Eurospeech : European Conference on Speech Communication and Technology, 1 391-394, Sep, 1999  
  • Takayuki Arai, Natasha Warner
    Proceedings of the 14th international congress of phonetic sciences., 2 1055-1058, Aug, 1999  
  • ARAI TAKAYUKI
    Proc. of the XIV International Congress of Phonetic Sciences, 2 857-860, Aug, 1999  
  • ARAI TAKAYUKI
    Proc. of the XIV International Congress of Phonetic Sciences, 1 615-618, Aug, 1999  
  • 今富摂子, 荒井隆行, 三村優子, 加藤正子, 大久保文雄, 保阪善昭
    日本口蓋裂学会誌, 24(2) 209-209, Jun, 1999  
  • N Kanedera, T Arai, H Hermansky, M Pavel
    SPEECH COMMUNICATION, 28(1) 43-55, May, 1999  
    We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy. (C) 1999 Elsevier Science B.V. All rights reserved.
  • T Arai, M Pavel, H Hermansky, C Avendano
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 105(5) 2783-2791, May, 1999  
    The intelligibility of syllables whose cepstral trajectories were temporally filtered was measured. The speech signals were transformed to their LPC cepstral coefficients, and these coefficients were passed through different filters. These filtered trajectories were recombined with the residuals and the speech signal reconstructed. The intelligibility of the reconstructed speech segments was then measured in two perceptual experiments for Japanese syllables. The effect of various low-pass, high-pass, and bandpass filtering is reported, and the results summarized using a theoretical approach based on the independence of the contributions in different modulation bands. The overall results suggest that speech intelligibility is not severely impaired as long as the filtered spectral components have a rate of change between 1 and 16 Hz. (C) 1999 Acoustical Society of America. [S0001-4966(99)01705-1].
  • Japanese journal of medical electronics and biological engineering : JJME, 37 230-230, Apr, 1999  
  • ARAI TAKAYUKI, Steven Greenberg, Rosaria Silipo
    Proc. of the International Conf. on Spoken Language Processing, 6 2803-2806, Nov, 1998  
  • Kanedera Noboru, Arai Takayuki, Funada Tetsuo, Yamada Youji
    IEICE technical report. Speech, 98(178) 45-52, Jul, 1998  
  • ARAI TAKAYUKI
    16th international congress on acoustics, and 135th meeting, Acoustical Society of America : the sound of the future, 4 2677-2678, Jun, 1998  
  • N Kanedera, H Hermansky, T Arai
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 2 613-616, 1998  
    We report on the effect of band-pass filtering of the time trajectories of spectral envelopes on speech recognition. Several types of filter (linear-phase FIR, DCT, and DFT) are studied. Results indicate the relative importance of different components of the modulation spectrum of speech for ASR. General conclusions are: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, (2) it is important to preserve the phase information in modulation frequency domain, (3) The features which include components at around 4 Hz in modulation spectrum outperform the conventional delta features, (4) The features which represent the several modulation frequency bands with appropriate center frequency and band width increase recognition performance.
  • T Arai, S Greenberg
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 2 933-936, 1998  
    The spectrum of spoken sentences was partitioned into quarter-octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency axis. Human listeners are remarkably tolerant of cross-channel spectral asynchrony induced in this fashion. Speech intelligibility remains relatively unimpaired until the average asynchrony spans three or more phonetic segments. Such perceptual robustness is correlated with the magnitude of the low-frequency (3-6 Hz) modulation spectrum and thus highlights the importance of syllabic segmentation and analysis for robust processing of spoken language. High-frequency channels (>1.5 kHz) play a particularly important role when the spectral asynchrony is sufficiently large as to significantly reduce the power in the low-frequency modulation spectrum (analogous to acoustic reverberation) and may thereby account for the deterioration of speech intelligibility among the hearing impaired under conditions of acoustic interference (such as background noise and reverberation) characteristic of the real world.
  • Kanedera Noboru, Hermansky Hynek, Arai Takayuki, Funada Tetsuo
    IEICE technical report. Speech, 97(441) 15-22, Dec, 1997  
    We report on the effect of band-pass filtering of the time trajectories of spectral envelopes on speech recognition. Several types of recognizers, several types of features, and several types of filters are studied. Results indicate the relative importance of different components of the modulation spectrum of speech for ASR. General conclusions are: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, (2) it is important to preserve the phase information in modulation frequency domain, (3) The features which include components at around 4 Hz in modulation spectrum outperform the conventional delta features, (4) The features which represent the several modulation frequency bands with appropriate center frequency and band width increase recognition performance.
  • ARAI TAKAYUKI, OKAZAKI Keiko, IMATOMI Setsuko, Yoshida Yuichi
    Journal of the Acoustical Society of Japan (E), 18(6) 297-304, Nov, 1997  
    Palatalized articulation (PA) is frequently observed in speech uttered by postoperative cleft palate patients. Provided the acoustical and perceptual cues of PA can be found, speech therapists will be able to use these cues to diagnose PA non-invasively and objectively. We tested human perception of certain synthetic sounds to verify the cues of the PA of /s/ in Japanese. To synthesize the fricatives, we modified the center frequency and the bandwidth of a complex-conjugate pole pair of an all-pole filter obtained from the linear predictive analysis of the PA of /s/. First, we shifted the center frequency from 1,000 to 3,000 Hz, while the relative bandwidth, or Q factor, was fixed at 10. Subsequently, we shifted the Q factor from 1 to 10, while the center frequency was fixed at 1,800 Hz. The results of a perceptual experiment involving nine speech therapists were conclusive that fricatives having a peak between 1,600 and 2,400 Hz tend to be identified as the PA of /s/, and fricatives having a peak at 1,800 Hz with the Q factor &gt5, tend to be identified as the PA of /s/. The two-tube model also showed that a peak around 2 kHz characterizes the PA of /s/.
  • T Arai, Y Yoshida
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 45(10) 2593-2595, Oct, 1997  
    Our procedure of real-zero conversion uses a spectrum-reversal technique to convert the information of a bandlimited signal to real zeros, We conducted a simple reconstruction experiment and showed that our proposed method is essentially equivalent to the conventional technique of sine-wave crossings.
  • ARAI TAKAYUKI, Noboru Kanedera, Hynek Hermansky, Misha Pavel
    Eurospeech : European Conference on Speech Communication and Technology, 3 1079-1082, Sep, 1997  
  • ARAI TAKAYUKI, Steven Greenberg
    Eurospeech : European Conference on Speech Communication and Technology, 2 1011-1014, Sep, 1997  
  • Arai Takayuki, Greenberg Steven
    IEICE technical report. Speech, 97(114) 25-32, Jun, 1997  
  • T Arai, M Pavel, H Hermansky, C Avendano
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 4 2490-2493, 1996  
    The effect of filtering the time trajectories of spectral envelopes on speech intelligibility was investigated. Since LPC cepstrum forms the basis of many automatic speech recognition systems, we filtered time trajectories of LPC cepstrum of speech sounds, and the modified speech was reconstructed after the filtering. For processing, we applied tow-pass, high-pass and band-pass filters. The results of the accuracy from the perceptual experiments for Japanese syllables show that speech intelligibility is not severely impaired as long as the filtered spectral components have 1) a rate of change faster than 1 Hz when high-pass filtered, 2) a rate of change slower than 24 Hz when low-pass filtered, and 3) a rate of change between 1 and 16 Hz when band-pass filtered.
  • ARAI TAKAYUKI, Keiko Okazaki, Setsuko Imatomi, Yuichi Yoshida
    Eurospeech : European Conference on Speech Communication and Technology, 3 1725-1728, Sep 3, 1995  
  • T ARAI
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E78D(6) 705-711, Jun, 1995  
    In this paper approaches to language identification based on the sequential information of phonemes are described. These approaches assume that each language can be identified from its own phoneme structure, or phonotactics. To extract this phoneme structure, we use phoneme classifiers and grammars for each language. The phoneme classifier for each language is implemented as a multi-layer perceptron trained on quasi-phonetic hand-labeled transcriptions. After training the phoneme classifiers, the grammars for each language are calculated as a set of transition probabilities for each phoneme pair. Because of the interest in automatic language identification for worldwide voice communication, we decided to use telephone speech for this study. The data for this study were drawn from the OGI (Oregon Graduate Institute)-TS (telephone speech) corpus, a standard corpus for this type of research. To investigate the basic issues of this approach, two languages, Japanese and English, were selected. The language classification algorithms are based on Viterbi search constrained by a bigram grammar and by minimum and maximum durations. Using a phoneme classifier trained only on English phonemes, we achieved 81.1% accuracy. We achieved 79.3% accuracy using a phoneme classifier trained on Japanese phonemes. Using both the English and the Japanese phoneme classifiers together, we obtained our best result: 83.3%. Our results were comparable to those obtained by other methods such as that based on the hidden Markov model.
  • 荒井隆行, 岡崎恵子, 今富摂子
    医用電子と生体工学. 特別号, 日本ME学会大会論文集 : 日本ME学会雑誌, 33 446-446, May, 1995  
  • YOSHINAGA Masayuki, ARAI Takayuki, YOSHIDA Yuichi
    1995(1) 77-78, Mar, 1995  
  • 荒井隆行, 岡崎恵子, 今富摂子
    電子情報通信学会技術研究報告. SP, 音声, SP94-100 15-20, Mar, 1995  
  • 荒井隆行, 岡崎恵子, 今富摂子
    音声言語医学, 36(1) 144-145, Jan, 1995  
  • 平井沢子, 加藤正子, 岡崎恵子, 荒井隆行
    音声言語医学, 36(1) 143-144, Jan, 1995  
  • Takayuki Arai, Keiko Okazaki, Setuko Imatomi
    Japan Journal of Logopedics and Phoniatrics, 36(3) 350-354, 1995  
    Using several models for synthesizing speech, we tested human perception of certain synthetic sounds to verify already published characteristics of palatalized articulation (PA). In the mono-syllable [sw] uttered with normal articulation, part of the fricative sound [s] was replaced by synthetic noise. The following three models were used to synthesize [s]. The first model is a bandpass filter which has a pass band in a specific frequency range. The cutoff frequencies were shifted in intervals from low to high, while the bandwidth was fixed. The second model is an all pole model with second-order linear predictive (LP) analysis. To implement this filter partial correlation (PARCOR) coefficients were used. The frequency of the Poles was shifted in intervals from low to high. The third model is an all pole model using higher order LP analysis for a typical PA of [s]. Each filter is excited by white noise to synthesize the fricatives. The hearing discrimination of nine speech therapists formed the data for the perceptual experiment they were each requested to indicate what they heard. Their replies were categorized as: “[s]”, “[J],” “the PA of [s],” or “other”. From the results we concluded : Dthe first model is not appropriate for synthesizing the PA of [s] 2) fricatives, which have a peak in the range of 2~3kHz, tend to be identified as the PA of [s] when synthesized by the second model : and 3) fricatives synthesized by the third model uning sixth or higher order LP tend to be identified as the PA of [s]. © 1995, The Japan Society of Logopedics and Phoniatrics. All rights reserved.
  • 荒井 隆行
    日本音響学会研究発表会講演論文集, 219-220, Oct, 1994  
  • 荒井 隆行, 大附克年, 白井克彦
    日本音響学会研究発表会講演論文集, 211-212, Oct, 1994  
  • 荒井 隆行, 吉永真之, 吉田裕一
    日本音響学会研究発表会講演論文集, 21-22, Oct, 1994  
  • 平井沢子, 岡崎恵子, 荒井隆行
    聴能言語学研究, 11(2) 96-96, Sep, 1994  
  • Hirai Sawako, Okazaki Keiko, ARAI TAKAYUKI
    The Japan Journal of Logopedics and Phoniatrics, 35(2) 199-206, Apr, 1994  
  • 荒井 隆行, R.Cole, E.Barnard
    日本音響学会研究発表会講演論文集, 169-170, Mar, 1994  
  • 平井沢子, 荒井隆行, 岡崎恵子
    音声言語医学, 35(1) 115-116, Jan, 1994  
  • KM BERKLING, T ARAI, E BARNARD
    ICASSP-94 - PROCEEDINGS, VOL 1, 1 289-292, 1994  
  • ARAI TAKAYUKI, Yeshwant Muthusamy, Kay Berkling, Ronald Cole, Etienne Barnard
    Eurospeech : European Conference on Speech Communication and Technology, 2 1307-1310, Sep, 1993  
  • ARAI TAKAYUKI, Y. Yoshida
    Proc. of the Third International Symposium on Signal Processing and its Applications, 1 283-286, Aug, 1992  
  • ARAI TAKAYUKI, Yoshida Yuichi
    The Journal of the Acoustical Society of Japan, 48(7) 474-482, Jul, 1992  
  • 荒井 隆行, 福井聡, 吉田裕一
    電子情報通信学会春季大会講演論文集, 258, Mar, 1992  
  • 荒井 隆行, 神尾広幸, 吉田裕一
    日本音響学会研究発表会講演論文集, 163-164, Oct, 1991  
  • 荒井 隆行, 吉田裕一
    電子情報通信学会技術研究報告. SP, 音声, SP90-78 7-13, Jan, 1991  
  • 荒井 隆行, 吉田裕一
    日本音響学会研究発表会講演論文集, 199-200, Sep, 1990  

Misc.

 72

Works

 11

Research Projects

 37

Academic Activities

 1

Social Activities

 1

Other

 55
  • Apr, 2006 - Jun, 2008
    英語によるプレゼンテーションを学ぶ講義の中で、自分のプレゼンテーションを客観的に学生に示すため、発表風景をビデオに収め、後で学生にそれを見て自己評価させるようにしている。また、同内容で2回目のプレゼンテーションを行わせ、改善する努力を促す工夫もしている。
  • 2003 - Jun, 2008
    音響教育に関する委員会の委員を務め、教育セッション(例えば2006年12月に行われた日米音響学会ジョイント会議における教育セッション)をオーガナイズするなど。
  • 2003 - Jun, 2008
    音響教育に関する委員会の委員を務め、教育セッション(例えば2004年4月に行われた国際音響学会議における教育セッション)をオーガナイズするなど。特に2005年からは委員長を仰せつかり、精力的に活動している(例えば、2006年10月に国立博物館にて科学教室を開催)。
  • Apr, 2002 - Jun, 2008
    本学に赴任して以来、「Progress Report」と称して研究室の教育研究活動に関する報告書を作成し発行している。これにより、研究室の学生の意識の向上にも役立ち、効果を発揮している。
  • Apr, 2002 - Jun, 2008
    普段から英語に慣れておくことが重要であると考え、研究室の定例ミーティングの中で定期的に英語によるミーティングを行っている。また、2006年度からは研究グループごとに行われる毎回の進捗報告も英語で行うことを義務付けている。