2023-11-28 21:55:07
The stunning Whisper, OpenAI’s language recognition model that faithfully transcribes audio into text, just got a little more stunning thanks to significant technical optimization. whisper.cppits popular port in C/C++, has recently won Full GPU support on Apple Silicon architecture. The result is a drastic improvement in performance.
The creator of the application MacWhisper, which has just implemented whisper.cpp 1.5, communicates on a processing time divided by two or three. We tested a MacBook Air M1 with an episode of our podcast Coming out of sleep lasting 16 min 30 and the Medium model (slow but with excellent recognition) as well as automatic language detection.
Podcast processing time in two different versions of MacWhisper
With version 5.7 of MacWhisper which relies on the CPU (the software takes up 400% of the CPU) and the Neural Engine, the complete transcription took 7 min 47. The same operation with version 6.0 of MacWhisper which takes advantage of the GPU (the CPU is almost no longer used) only takes 3 min 28. The analysis time is well divided by more than two, a big difference which can encourage the use of a larger model (more efficient in terms of recognition, but slower to run) than we did until now.
Incidentally, the effort to integrate MacWhisper into macOS continues with the possibility of using its keyboard to control the audio and the presence of the app in the multimedia menu of the menu bar.
Hello Transcribe, another application that stands out for its iPhone/iPad compatibility in addition to Mac, was also recently updated with whisper.cpp 1.5. Its developer announces a performance improvement of 400% with a large model on a Mac M1 Max and 100% with a medium model on an iPhone 14 Pro.
1701209402
#Audio #Transcription #Huge #Performance #Improvement #Whisper #Mac