Osaurus

Osaurus — Free Download. Local LLM server native to Apple Silicon
Osaurus is a local large language model (LLM) server, designed exclusively for Apple Silicon chips. It uses Apple's MLX framework to achieve maximum performance on M-series Macs. It provides an API endpoint compatible with OpenAI, allowing integration with various desktop AI assistant applications. The app includes a SwiftUI interface for managing models and a built-in HTTP server built with SwiftNIO.
5.0(1 ratings)

Download Osaurus (Official links)
File size: 11.8 MB
The latest version of Osaurus is: 0.11.2
Operating system: MacOS
Languages: English
Price: $0.00 USD

  • Native MLX Runtime. Osaurus runs on top of MLX, Apple's machine learning framework optimized for its processors. This approach directly leverages the Neural Engine and Metal GPU acceleration, resulting in faster inference and efficient memory usage compared to Python or Electron-based solutions.
  • Exclusive to Apple Silicon. The software is developed and tested specifically for the architecture of M1, M2, M3 chips and beyond. It does not support Intel Macs or other platforms, allowing for deep system and hardware optimization.
  • OpenAI API Compatibility. The server implements the /v1/models and /v1/chat/completions endpoints, in both streaming and non-streaming modes. This allows clients designed for the OpenAI API, such as various coding assistant applications, to work with Osaurus without modifications.
  • Function Calling (Tool Calling). Supports the OpenAI style for defining tools and their selection (tool_choice). It parses tool calls (tool_calls) and manages streaming deltas in response streams, facilitating the integration of autonomous agent capabilities.
  • Intelligent Chat Templates. Employs the Jinja chat template provided by the model, respecting the Beginning-of-Sequence (BOS) and End-of-Sequence (EOS) tokens. Includes an automatic fallback system for models that do not define a template, ensuring correctly formatted prompts are generated.
  • KV Cache Reuse (Sessions). Using a session_id parameter, the server can retain the key-value cache between conversation turns. This reduces latency in multi-turn dialogues, as it is not necessary to reprocess the entire history with each new interaction.
  • Low-Latency Token Streaming. Uses Server-Sent Events (SSE) to send generated tokens to the client in real-time, as they are produced. This technique provides an incremental writing experience without perceptible waits.
  • Integrated Model Manager. The interface allows browsing, downloading, and managing models directly from the mlx community repositories on Hugging Face. Downloaded models are automatically configured for immediate use with the server.
  • System Resource Monitor. Displays real-time CPU and RAM usage within the application interface. This visualization allows the user to observe the impact of the loaded model and server activity.
  • Self-Contained Application. Combines a SwiftNIO HTTP server and a SwiftUI user interface into a single bundle. It does not require external runtime environments like Python, simplifying installation and reducing the disk footprint.
  • Minimalist Menu Bar Interface. The application resides primarily as an icon in the macOS menu bar, providing quick control to start/stop the server and access settings. This design keeps the desktop uncluttered.
  • Measurably Superior Performance. Internal benchmarks indicate that Osaurus can be approximately 20% faster in inference than other local solutions like Ollama, when running equivalent models on the same Apple Silicon hardware.

The company Dinoki began developing Osaurus in 2024. The project arose from observing the potential of Apple Silicon chips for local AI and the absence of native, optimized LLM servers for this platform. The developers, specialized in Swift and macOS systems, built the application primarily using the Swift programming language, leveraging the SwiftUI frameworks for the interface and SwiftNIO for the network server. The decision to use Swift and MLX, instead of common cross-platform stacks, was made to gain maximum control over performance and integration with Apple's system APIs.


Alternatives to Osaurus: