Cómo correr los modelos Gemma 4 localmente

Cloud AI APIs are great until they are not. Rate limits, usage costs, privacy concerns, and network latency all add up. For quick tasks like code review, drafting, or testing prompts, a local model that runs entirely on your hardware has real advantages: zero API costs, no data leaving your machine, and consistent availability.
Google’s Gemma 4 is interesting for local use because of its mixture-of-experts architecture. The 26B parameter model only activates 4B parameters per forward pass, which means it runs well on hardware that could never handle a dense 26B model. On my 14” MacBook Pro M4 Pro with 48 GB of unified memory, it fits comfortably and generates at 51 tokens per second. Though there’s significant slowdowns when used within Claude Code from my experience.

Todo lo que hago fuera de mi trabajo diurno sucede en una MacBook Air M3 de 2023 con 16GB de RAM.

Ahorita no me hace falta más poder — rara vez se me pone lenta, pero mientras más incorporo LLMs en mis workflows, más me pregunto qué tanta autonomía tengo si un día no tengo internet.

Cómo correr los modelos Gemma 4 localmente ↗

Comentarios

Sigue leyendo

Comentarios