Request: Apple Silicon / Mac Metal or MLX support for JoyCaption #73

Phlypper · 2026-06-09T15:30:41Z

Phlypper
Jun 9, 2026

Hi JoyCaption developers,
I’m using JoyCaption locally on an Apple Silicon MacBook Pro with an M4 Max GPU. My use case is accessibility: I am building a local video-to-audio-description workflow for blind users, where JoyCaption captions sampled video frames and the app turns those captions into spoken narration.
The current Hugging Face / LLaVA-style JoyCaption model can run on Mac in some setups, but Apple Silicon support is still difficult compared with CUDA. PyTorch MPS / Metal compatibility, memory use, dtype handling, and speed can be challenging, especially with newer or larger JoyCaption versions.
I wanted to ask whether you would consider one of the following for future JoyCaption releases:
Better tested Apple Silicon support through PyTorch MPS / Metal
An MLX-compatible version for Apple Silicon
A Core ML export or conversion guide
A smaller or quantized Mac-friendly model
Clear Mac setup notes for Transformers users
This would be very helpful for accessibility-focused local apps. Many blind users and Mac users would benefit from a strong local image/video captioning model that does not require NVIDIA CUDA hardware.
My current system:
Apple Silicon MacBook Pro, M4 Max
macOS
PyTorch MPS / Metal
Hugging Face Transformers
Local JoyCaption model folder
Thank you for building JoyCaption and making it available to the community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Apple Silicon / Mac Metal or MLX support for JoyCaption #73

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Request: Apple Silicon / Mac Metal or MLX support for JoyCaption #73

Uh oh!

Phlypper Jun 9, 2026

Replies: 0 comments

Phlypper
Jun 9, 2026