Acoustical Science and Technology
Online ISSN : 1347-5177
Print ISSN : 1346-3969
ISSN-L : 0369-4232
Volume 46, Issue 2
Displaying 1-10 of 10 articles from this issue
PAPERS
  • Yoshiko Arimoto, Dan Oishi, Minato Okubo
    2025 Volume 46 Issue 2 Pages 125-135
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 06, 2024
    JOURNAL OPEN ACCESS

    To ensure the reliability of the evaluations obtained through crowdsourcing services, this study demonstrated methods of selecting qualified evaluators and reliable ratings, using emotional ratings for nonverbal vocalization obtained via crowdsourcing service. To evaluate the efficiency of the methods, emotional ratings were also obtained through a listening experiment in an in-person laboratory setting. Three filtering criteria were demonstrated, i.e., (a) excluding evaluators who rate more than 45% of assigned samples with a unique value, (b) excluding evaluators who take less than 7 seconds to rate each of assigned samples, and (c) excluding emotion rating instances which are associated with a low self-reported confidence rating. The results of the study showed that the crowdsourcing listening test exhibited similar tendencies to the in-person test, exhibiting high correlation coefficients of 0.873 for arousal, 0.739 for pleasantness, and 0.704 for dominance when the evaluators who took less than 7 seconds to evaluate the speech sample were eliminated. However, the differences in the correlation coefficients were only 0.001–0.007 between the filtered and the non-filtered scores. Moreover, the results revealed that the self-reported confidence scores can eliminate unreliable evaluation ratings, but the correlation improved only marginally.

    Download PDF (674K)
  • Makoto Morinaga, Shigenori Yokoshima, Tomohiro Kobayashi, Sakae Yokoya ...
    2025 Volume 46 Issue 2 Pages 136-145
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: October 29, 2024
    JOURNAL OPEN ACCESS

    The oppressive or vibratory sensation caused by low-frequency sound is a widely known sensation inherent to that type of sound. In previous studies using one-third octave band noise as stimuli, the frequency region that causes the oppressive or vibratory sensation was felt before other sensations such as loudness and noisiness (here, called the peculiar region). However, it has been suggested that level fluctuations of one-third octave band noise affect the oppressive or vibratory sensation. Furthermore, few studies have investigated the threshold of these sensations. In the present study, we conducted laboratory experiments to investigate the peculiar region from 10 to 160 Hz as well as the sensation threshold by using low-frequency pure tones. The peculiar region in which the oppressive or vibratory sensation became dominant was generally consistent with the findings of previous studies. However, differences were found in relatively higher frequencies such as 80 and 160 Hz. In addition, the median threshold value was lower than the lowest level of the peculiar region. The threshold differed greatly among the participants, and the higher the frequency, the more pronounced the difference. Multiple regression analysis suggested that these individual differences might be related to noise sensitivity.

    Download PDF (950K)
  • Hien Ohnaka, Ryoichi Miyazaki
    2025 Volume 46 Issue 2 Pages 146-156
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 18, 2024
    JOURNAL OPEN ACCESS

    This paper proposes an unsupervised DNN-based speech enhancement approach founded on deep priors (DPs). Here, DP signifies that DNNs are more inclined to produce clean speech signals than noises. Conventional methods based on DP typically involve training on a noisy speech signal using a random noise feature as input, stopping training only a clean speech signal is generated. However, such conventional approaches encounter challenges in determining the optimal stop timing, experience performance degradation due to environmental background noise, and suffer a trade-off between distortion of the clean speech signal and noise reduction performance. To address these challenges, we utilize two DNNs: one to generate a clean speech signal and the other to generate noise. The combined output of these networks closely approximates the noisy speech signal, with a loss term based on spectral kurtosis utilized to separate the noisy speech signal into a clean speech signal and noise. The key advantage of this method lies in its ability to circumvent trade-offs and early stopping problems, as the signal is decomposed by enough steps. Through evaluation experiments, we demonstrate that the proposed method outperforms conventional methods in the case of white Gaussian and environmental noise while effectively mitigating early stopping problems.

    Download PDF (1555K)
  • Takayuki Hidaka, Akira Omoto, Noriko Nishihara
    2025 Volume 46 Issue 2 Pages 157-166
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: December 12, 2024
    JOURNAL OPEN ACCESS

    This paper studies whether there is a difference in subjective judgments between musical experts and non-experts regarding the preferred reverberation time and clarity of concert halls based on a psychoacoustic test. The test signals were piano and violin solos convoluted with binaural room impulse responses measured at 34 positions in 18 symphonic halls. Experts consisted of outstanding musicians, music managers, recording engineers, and acousticians. They all had listening experience in many halls listed here. Non-experts were students who had more extended musical training than ordinary ones. The preferred reverberation time at mid-frequencies (average of 500 Hz and 1,000 Hz) obtained for piano and violin were 1.2 to 2.0 s and 1.8 to 2.4 s for the experts, and 0.9 to 2.1 s and 1.6 to 2.7 s for the non-experts. The latter resulted in a 50% and 83% broader range of judgments for piano and violin, respectively. Clarity showed a similar tendency. This result indicates that the subjective judgment by musical experts is more reliable than non-experts when designing actual concert halls.

    Download PDF (725K)
TECHNICAL REPORT
  • Tatsuya Kitamura, Jin Oyama, Jing Sun, Ryoko Hayashi
    2025 Volume 46 Issue 2 Pages 167-172
    Published: March 01, 2025
    Released on J-STAGE: March 01, 2025
    Advance online publication: November 01, 2024
    JOURNAL OPEN ACCESS

    This study aimed to develop an indicator for assessing articulatory motion during fast and repetitive syllable production, focusing on fluency, periodicity, and consistency. The method utilizes a kymograph derived from the ultrasound imaging of tongue movements in the midsagittal plane. The kymograph was generated by juxtaposing pixels along an observation line through the point of the greatest tongue movement. Periodic patterns in the kymograph indicate controlled, consistent tongue and mandibular movements, whereas nonperiodic patterns suggest speech disturbances. The method employs power spectral image, obtained through two-dimensional discrete Fourier transforms of the kymograph. The resulting power spectrum represents the periodic components in the horizontal direction of the kymograph, with prominent peaks indicating consistent patterns. To validate the method, the authors analyzed ultrasound movies of healthy Japanese speakers—both fluent speakers and those who experienced a sense of speech clumsiness—producing repetitive syllables (/aka/, /aga/, /ata/, and /ada/). The results demonstrated the effectiveness of the indicator in distinguishing between periodic and nonperiodic tongue motions. This approach shows promise for application to real-time MRI movies, potentially opening new avenues for the in-depth analysis of motor speech function. This indicator contributes to the assessment and quantification of the articulatory motion.

    Download PDF (609K)
ACOUSTICAL LETTERS
feedback
Top