File:Gemini modality combination – cooking helper scenario.png

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Original file(1,539 × 1,090 pixels, file size: 658 KB, MIME type: image/png)

Captions

Captions

From the study "Gemini: A Family of Highly Capable Multimodal Models"

Summary

[edit]
Description
English: "Audio-visual qualitative example showcasing the ability of Gemini models to process interleaved sequences of text, vision, and audio, as well as reason across modalities. This example inputs interleaved images and audio from the user in a cooking scenario. The user prompts the model for instructions to make an omelet and to inspect whether it is fully cooked."
Date
Source https://arxiv.org/abs/2312.11805
Author Authors of the preprint: Gemini Team Google: Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al.

Licensing

[edit]
w:en:Creative Commons
attribution
This file is licensed under the Creative Commons Attribution 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current22:14, 4 March 2024Thumbnail for version as of 22:14, 4 March 20241,539 × 1,090 (658 KB)Prototyperspective (talk | contribs)Uploaded a work by Authors of the preprint: Gemini Team Google: Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. from https://arxiv.org/abs/2312.11805 with UploadWizard

There are no pages that use this file.