This app is designed as a learning aid for anyone seeking to become familiar with the speech capabilities of generative AI apps and agents. It's based on the user interface in Microsoft Foundry portal, but does not use any Azure cloud services.
The app uses two language models:
- Microsoft Phi-3 mini (GPU mode) - The primary model, hosted in the WebLLM module. This requires a modern browser that supports the WebGPU API and an integrated or dedicated GPU.
- SmolLM2-360M (CPU mode) - A fallback model that runs on CPU using WebAssembly. This loads automatically if WebGPU is not available, though responses may be slower than GPU mode.
You can manually switch between models using the model selector in the configuration panel.
Known issues
- The initial download of either model may take a few minutes - particularly on low-bandwidth connections. Subsequent loads should be quicker as models are cached.
- Some GPU-enabled computers (particularly those with ARM-based processors) do not support WebGPU without enabling the Unsafe WebGPU Support browser flag. If your browser fails to load the Microsoft Phi model, you can try enabling this flag at edge://flags on Microsoft Edge or chrome://flags on Google Chrome. Disable it again when finished! The SmolLM2 (CPU) model will work as a fallback on these systems.
- Microsoft Edge on ARM-based computers does not support Web Speech for speech recognition (speech to text), and returns a network error when attempting to capture input from the mic. Speech synthesis (text to speech) should still work.
- SmolLM2 (CPU mode) responses may be noticeably slower than Phi-3 (GPU mode), especially on older computers with slower processors. This is normal for CPU-based inference.