Gemini’s got Spotify integration. What else can it do?

Gemini, same as other AIs, sees much use as a machine that answers questions, generates pictures, does coding, and solves problems. However, it would seem that Google had slightly different use cases in mind when giving this gift to the public. A recent integration with Spotify confirms that assumption: Gemini, at least in the incarnation dwelling in portable Android-powered devices, is a digital assistant by design.

Yes, everything is still very fresh, and several years from now we’ll be looking back at LLMs of today and smiling the same way as we do when seeing an infant’s first steps. Yet, from the perspective of now, the capabilities of Google’s Gemini AI are quite impressive. Let’s see what this artificial intelligence can do for you, wearing the hat of an assistant.

Gemini AI as assistant: popular functions

Gemini's integration with various Google services and other applications enables it to perform numerous tasks. Apart from the abovementioned capability of controlling playback in Spotify, the AI can do the following.

Summarization. Gemini is known to be quite good with videos – especially those published to YouTube – and texts in terms of extracting the very gist therefrom and serving it as a summary. The assistant can quickly scan lengthy documents, analyze attached PDFs in Gmail, and generate structured outputs from complex data sets. At the Google I/O event, the company claimed that the amount of information the LLM can work with is up to 1,500 pages, which is massive.

Task management. Gemini integrates with Google Keep and Tasks and lets you quickly create (shopping) lists, reminders, scheduled events, etc. While at it, you can ask Gemini to find some info (a recipe, for example) outside and add it to the entry being created, and the AI will do just that without forcing you to switch apps.

Answers to questions. Of course, this feature should be mentioned, too. What deserves a special note, though, is Gemini’s ability to process free-flowing speech and awareness of context, which ups the chances of you receiving exactly the response needed.

Native multimodal and long context capabilities. These may be a bit more narrow in terms of applicability, but Google is as proud of them as to author a dedicated blog post filled with examples (find it here). According to this post, Gemini can describe pictures, extract only the needed information from source materials and serve it in the requested shape. Plus, there are several other tricks like image recognition, web page content extraction, and JSON formation up its sleeve.

Gemini AI development possibilities

While Google hasn’t published a development roadmap for Gemini, some bits and pieces mentioned in its blogs and at events allow assuming some directions the AI will be developed in.

Further integration: Google aims to deepen Gemini's integration with more Google services such as Google Home, Phone, and Messages. This will likely allow for more seamless interactions across various platforms.

Enhanced understanding of context: Gemini 1.5 Pro has the token limit of up to 1 million, which means it’ll be able to handle more complex queries and maintain context over longer interactions.

AI-driven music authoring: looks like Google wants a piece of this cake, too, and some features in the next iterations of Gemini may be enabling you to generate music and sounds based on prompts.

Astra: remember JARVIS from Iron Man? Astra is Google’s take on the concept. Still mostly under wraps, it looks like this is going to be a real deal, if they can pull it off.

Stay tuned for more news from the exciting world of AI!