Images and words: mechanics of automated captioning with neural networks


Image captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. Like in the notorious “finger pointing to the moon”, automated image captioning requires the ability to discern what it’s really going on in a scene and generate a fluent description for the act taking place. In this talk we present the underlying mechanics to the object detection and language generation using Convolutional and Recurrent Neural Networks.

Language: English

Level: Intermediate

Alberto Massidda

Production Engineer - Meta

Computer engineer since 2008, specialized in mission critical, high traffic, high available Linux architectures and infrastructures (before the cloud was out), with a relevant experience in development and management of web services. Infrastructure Lead, SRE, AI researcher, university Teaching Assistant, opensource dev, worked among others at Translated, N26, Meta. Alberto has a variegated bundle of experience, that ranges from devops to machine learning, from the corporate banking to the mutable startup world.

Go to speaker's detail