Then back to a flip phone you go. Apple does the same with Siri, heck they pioneered it. Phones simply aren't powerful enough to process speech without some help from backend services.
The reason they send the speech over is so they can improve the service without having to update the client. Also, they can run analytics on all the saved speeches and see what people are actually asking.