Publications
Yu Hayashizaki, Takashi Nose, Sumiharu Kobayashi, Satoru Fukayama, Akinori Ito, "PUNSER: Large-Scale Pre-trained and Unified Model for Practical Speech Emotion Recognition," in Proceedings of the 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Oct. 2025.
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder," in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025, Oct. 2025.
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "The CMU-AIST submission for the ICME 2025 Audio Encoder Challenge," in Proceedings of 2025 IEEE International Conference on Multimedia and Expo Workshops, Jun. 2025.
Kentaro Onda, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu, "Benchmarking Prosody Encoding in Discrete Speech Tokens," in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2025.
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu, "Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora," in Proceedings of Interspeech 2025, Aug. 2025.
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu, "Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data," in Proceedings of Interspeech 2025, Aug. 2025.
Rinka Nobukawa, Makito Kitamura, Tomohiko Nakamura, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Drum-to-vocal percussion sound conversion and its evaluation methodology," in Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Oct. 2025.
Yuki Ito, Tomohiko Nakamura, Shoichi Koyama, Shuichi Sakamoto, and Hiroshi Saruwatari, "Spatial upsampling of head-related transfer function using neural network conditioned on source position and frequency," IEEE Open Journal of Signal Processing, Sep. 2025.
Go Nishikawa*, Wataru Nakata*, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari, and Tomohiko Nakamura, "Multi-sampling-frequency naturalness MOS prediction using self-supervised learning model with sampling-frequency-independent layer," in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2025. (*: equal contribution)
Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, and Jun Ogata, "IdolSongsJp corpus: A multi-singer song corpus in the style of Japanese idol groups," in Proceedings of the 26th International Society for Music Information Retrieval (ISMIR) Conference, Sep. 2025.