We Tried Google’s Gemini AI Chatbot and Found It to Be More Capable but Still Prone to Hallucinations

We Tried Google’s Gemini AI Chatbot and Found It to Be More Capable but Still Prone to Hallucinations

Google has come a great distance with its generative synthetic intelligence (AI) choices. One 12 months in the past, when the tech big first unveiled its AI assistant, Bard, it grew to become a fiasco because it made a factual error answering a query relating to the James Webb Area Telescope. Since then, the tech big has improved the chatbot’s responses, added a suggestions mechanism to verify the supply behind the responses, and extra. However the largest improve got here when the corporate modified the big language mannequin (LLM), powering the chatbot from Pathways Language Mannequin 2 (PaLM 2) to Gemini in December 2023.

The corporate known as Gemini AI its most powered language mannequin thus far. It additionally added AI picture technology functionality to the chatbot, taking it multimodal, and even renamed it Gemini. However simply how a lot of a soar is it for the AI chatbot? Can it now compete with Microsoft Copilot, which is predicated on GPT-4 and has capabilities? And what concerning the cases of AI hallucination (a phenomenon the place AI responds with false or non-existent data as info)? We determined to search out out.

Google AI can at the moment be accessed in a number of methods. Google Superior is a paid subscription with the Google One AI Premium plan that fees Rs. 1,950 month-to-month. There’s an Android app of Google Gemini as nicely. Nevertheless, it isn’t but obtainable in India. Google Pixel 8 Professional additionally comes with the Gemini Nano mannequin. For our testing functions, we determined to make use of Google’s Gemini Professional-powered internet portal, which is accessible in additional than 230 nations and territories and is free to make use of.

Google Gemini’s generative capabilities

The web site’s consumer interface stays the identical, however the identify has been modified from Bard to Gemini. When you’re signed in along with your Google account, the AI will welcome you along with your identify and ask, “How can I assist you to as we speak?” Beneath are just a few useful, immediate strategies highlighting completely different duties it may well carry out.

First, we requested it to jot down an electronic mail to check its primary generative expertise. Contemplating it’s the layoff season, we requested it to “draft a sympathetic electronic mail to an worker from a supervisor informing them that they’ve been laid off.” To check its efficiency, we advised Copilot to do the identical. We discovered Gemini’s response to be moderately bland and robotic, making an attempt to slot in an excessive amount of data proper firstly. The primary paragraph mentioned, “It’s with nice remorse that I write to tell you a couple of troublesome choice we have needed to make at [Company Name]. Resulting from [reason], we’re required to make reductions in our workforce, and your place might be eradicated, efficient [date].”

As compared, Copilot’s response felt extra sympathetic and humane. It wrote, “I hope this electronic mail finds you nicely. I remorse to tell you that as a consequence of latest organizational modifications, we now have needed to make some troublesome selections. Sadly, your place at [Company Name] has been affected, and I’m deeply saddened to let you realize that you’ve got been laid off.”

It should not come as a shock since Google’s chatbot is understood for being a bit too bland with formal content material technology, and this has not modified even with the Gemini mannequin. Nevertheless, relating to casual content material technology, Gemini begins to indicate its capabilities.

We requested each the chatbots to “write a letter informing my mom about me getting laid off and Write a brief observe to my mom explaining that I used to be laid off. Be sure that to convey my feelings of disappointment and grief.”

Copilot took it actually and wrote, “The suddenness of all of it has left me feeling a mixture of feelings—disappointment, grief, and a contact of uncertainty about what lies forward.” However Google understood the underlying message and did a a lot better job at expressing the feelings. It wrote, “I do know you all the time nervous about job safety, and I want I may inform you this was completely different. It wasn’t something I did fallacious, they’re [reason]. I am making an attempt to remain constructive, but it surely’s laborious to not really feel discouraged. I do not need you to fret an excessive amount of, although. I am not panicking, and I am already beginning to consider my subsequent steps.”

This was a lot better composed, and one might be forgiven for pondering an actual human wrote it. We carried out just a few extra checks, and Gemini’s responses had been rather more balanced every time than the GPT-4-powered Copilot.

Google Gemini’s accuracy

The following take a look at was to verify the accuracy of the AI mannequin. Many use AI to get a solution to a question, and we wished to do the identical. It dealt with static common knowledge-based questions nicely. Even after we despatched just a few grey-zone questions, it caught to the info and didn’t embody any additional data. When the questions had been controversial, it refused to reply them, which is what you’ll anticipate from a accountable AI chatbot.

The issue got here after we requested it a well-liked trick query that has stumped the older technology of the chatbot previously. After we prompted the query, “nations in Africa that begin with the letter Okay”, it inadvertently responded with, “There are not any nations in Africa that begin with the letter “Okay” as of as we speak, February 12, 2024. Whereas conditions can change over time, this truth has been constant for some time.” For the unaware, the difficulty occurs as a result of the LLM it was educated on had the prompted phrase with the identical response. Apparently, this explicit glitch is current on Copilot and ChatGPT, and regardless of common reviews, it has nonetheless not been taken down for any of the chatbots.

Nevertheless, the issue of AI hallucination didn’t finish there. We additionally got here throughout a number of different faulty responses by Gemini. After we requested, “Based mostly on critiques, inform me the professionals and cons of iPhone 15 Professional”, it responded with, “The iPhone 15 Professional hasn’t been formally introduced but”. In actuality, the Apple smartphone was launched in September final 12 months. As compared, Copilot fared higher in technical questions.

Google Gemini in assistive duties

One other talent most AI chatbots boast of is their assistive options. They will brainstorm an thought, create an itinerary for a visit, evaluate your choices, and even converse with you. We began by asking it to make an itinerary for a 5-day journey to Goa on a finances and to incorporate issues individuals can do. Because the creator was not too long ago in Goa, this was simpler for us to check. Whereas Gemini did a good job at highlighting all the favored locations, the reply was not detailed and never a lot completely different from any journey web site. One constructive of that is that the chatbot will probably not counsel something incorrect.

Alternatively, I used to be impressed by Copilot’s exhaustive response that included hidden gems and even the names of cuisines one ought to attempt. We repeated the take a look at with completely different variations, however the end result remained constant.

Subsequent, we requested, “I dwell in India. Ought to I purchase a subscription to Amazon Prime Movies or Netflix?” The response was thorough and included numerous parameters, together with content material depth, pricing, options, and advantages. Whereas it didn’t immediately counsel one amongst them, it listed why a consumer ought to decide both of the choices. Copilot’s reply was the identical.

Lastly, we hung out chatting with Gemini. This take a look at spanned just a few hours, and we examined the chatbot on its skill to be participating, entertaining, informative, and contextual. In all of those parameters, Gemini carried out fairly nicely. It will possibly inform you a joke, share less-known info, offer you a bit of recommendation, and even play phrase and picture-based video games with you. We additionally examined its reminiscence, but it surely may keep in mind the conversion even after texting for an hour. The one factor it can not do is give a single-line response to messages like a human good friend would.

Google Gemini’s picture technology functionality

In our testing, we got here throughout a bunch of attention-grabbing issues about Gemini AI’s image-generation capabilities. As an illustration, all the pictures generated have a decision of 1536×1536, which can’t be modified. The chatbot additionally refuses to fulfil any requests requiring it to generate photographs of real-life individuals, which can probably decrease the dangers of deepfakes (creating AI-generated footage of individuals and objects that seem actual).

However coming to the standard, Gemini did a trustworthy job of sticking to the immediate and producing photographs. It will possibly generate random photographs in a selected fashion, resembling postmodern, lifelike, and iconographic. The chatbot also can generate photographs within the fashion of common artists in historical past. Nevertheless, there are numerous restrictions, and you’ll probably discover Gemini refusing your request when you ask for one thing too particular. However evaluating it with Copilot, I discovered the pictures had been generated quicker, stayed true to the prompts, and appeared to have a wider vary of kinds we may faucet into. Nevertheless, it can’t be in comparison with devoted image-generating AI fashions resembling DALL-E and Midjourney.

Google Gemini: Bottomline

General, we discovered Gemini AI to be fairly competent in most classes. As somebody who has occasionally used the AI chatbot ever because it grew to become obtainable, I can confidently say that the Gemini Professional mannequin has made it higher to know pure language communication and achieve a contextual understanding of the queries. The free chatbot model is a dependable companion if one wants it to generate concepts, write an off-the-cuff observe, plan a visit, and even generate primary photographs. Nevertheless, it shouldn’t be used as a analysis instrument or for formal writing, as these are the 2 areas the place it struggles quite a bit.

Comparatively, Copilot is healthier at formal writing and itinerary technology, on par with holding conversations (albeit with a shorter reminiscence) and comparisons. Gemini takes the crown at picture technology, casual content material technology, and interesting the consumer. Contemplating that is simply the primary iteration of the Gemini LLM, versus the 4th iteration of GPT, we’re curious to witness the other ways the tech big additional improves its AI assistant.


Affiliate hyperlinks could also be robotically generated – see our ethics assertion for particulars.

Check Also

PhonePe Launches Indus Appstore in India, Will Not Charge App Listing Fee for the First Year

PhonePe Launches Indus Appstore in India, Will Not Charge App Listing Fee for the First Year

PhonePe launched its Android-based cell app market, Indus Appstore, in India on Wednesday, February 21. …