Empowering Windows: Mu Language Model Integration

Share

Key Points:

  • Microsoft has introduced a new on-device small language model called Mu, which is designed to operate efficiently and deliver high performance while running locally.
  • Mu is a 330M encoder-decoder language model that is optimized for small-scale deployment on Neural Processing Units (NPUs) and edge devices, and is capable of handling tens of thousands of input context lengths and over a hundred output tokens per second.
  • Mu has been fine-tuned and applied to build a new Windows agent in Settings on Copilot+ PCs, which understands natural language and changes relevant undoable settings seamlessly.

Microsoft has announced the introduction of Mu, a new on-device small language model that is designed to operate efficiently and deliver high performance while running locally. Mu is a 330M encoder-decoder language model that is optimized for small-scale deployment on Neural Processing Units (NPUs) and edge devices. This model is capable of handling tens of thousands of input context lengths and over a hundred output tokens per second, making it ideal for on-device and real-time applications.

The development of Mu was informed by the insights gained from enabling Phi Silica to run on NPUs, which provided valuable information about tuning models for optimal performance and efficiency. Mu’s encoder-decoder architecture is designed to reuse the input’s latent representation, which greatly reduces computation and memory overhead. This results in lower latency and higher throughput on specialized hardware, such as the Qualcomm Hexagon NPU.

To optimize Mu’s performance, Microsoft employed various techniques, including weight sharing, dual LayerNorm, Rotary Positional Embeddings, and Grouped-Query Attention. These techniques allow Mu to squeeze more performance from a smaller model, while also reducing memory footprint and compute requirements.

Mu was trained using A100 GPUs on Azure Machine Learning, and was fine-tuned using various tasks, including SQUAD, CodeXGlue, and Windows Settings agent. The results show that Mu is nearly comparable in performance to a similarly fine-tuned Phi-3.5-mini, despite being one-tenth of the size.

To enable Mu to run efficiently on-device, Microsoft applied advanced model quantization techniques tailored to NPUs on Copilot+ PCs. This involved converting the model weights and activations from floating point to integer representations, which preserved model accuracy while drastically reducing memory footprint and compute requirements.

Mu has been fine-tuned and applied to build a new Windows agent in Settings on Copilot+ PCs, which understands natural language and changes relevant undoable settings seamlessly. The agent is integrated into the existing search box, and is designed to provide ultra-low latency for numerous possible settings. The results show that the Mu model fine-tune achieved response times of under 500 milliseconds, aligning with Microsoft’s goals for a responsive and reliable agent in Settings.

The introduction of Mu and the new Windows agent in Settings marks a significant breakthrough in the development of on-device language models. As Microsoft continues to refine the experience for the agent in Settings, the company welcomes feedback from users in the Windows Insiders program. The success of this project is a testament to the collaboration and support of various teams, including the Applied Science Group and partner teams in WAIIA and WinData.

Read the rest: Source Link

You might also like: Try AutoCAD 2026 for Windows, best free FTP Clients on Windows & browse the best Surface Laptops to buy.
Remember to like our facebook and our twitter @WindowsMode for a chance to win a free Surface every month.


Discover more from Windows Mode

Subscribe to get the latest posts sent to your email.