Can an enterprise use software to figure out what a digital image or a video is "about"? (About, as I am using the word, means looking at a snapshot of a farm and recognizing the pigs, the cows and the chickens.)
Visualize your office building monitored by surveillance cameras. Instead of a human security guard watching for an intrusion, software "watches" the digital video and makes a decision about a specific individual attempting to enter the building. The image recognition system plucks a person's face from the real-time video stream, matches it to a database and determines whether he or she is a vice president or a stranger without access permission. The system "recognizes" the executive and unlocks the door.
For many years, security professionals have funded, tested and tweaked commercial systems to make image recognition of faces a reliable reality, not a science fiction fantasy. Alas, software sufficiently "smart" to figure out the identity of an individual or to determine the "aboutness" of a digital image is a pot of gold long sought after but not yet found.
Google and celebrity facial recognition
But advances are being made. In May 2011, Google's patent "Automatically Mining Person Models of Celebrities for Visual Search Applications" set off a flurry of commentary on blogs and mainstream publications like Forbes. Patent US20110116690 was being downloaded when Google's chairman, Eric Schmidt, was explaining that image recognition was "too creepy." (See "Facial Recognition: Google Chairman Warns US Govt", May 20, 2011, at http://goo.gl/DPOuj.)
When I want some insight into next-generation search technology, I navigate to Google Research's Publications by Googlers at http://research.google.com/pubs/papers.html. Although not a comprehensive archive, the technical papers provide a useful glimpse into search technologies from some of the world's most sophisticated engineers and scientists. In the category Audio, Video and Image Processing, there were more than 100 technical papers, last I checked.
One research report suggested that Google's experts were testing a taxonomy with more than 1,000 categories. The idea was to use "smart software" to figure out what a video is "about." To me, the Google method echoes Autonomy's (autonomy.com) approach, and demonstrated that Google algorithms can categorize video without metadata at an acceptable level of accuracy.
A 2009 article indicated that Google is working to figure out the "what" in imagery. And yet another report suggested that Google has powerful image functionality that remains, for now, on the sidelines. Is that due to a decision dictated by financial, legal or technical factors? There is scant information about Google's plans for its image recognition technology. What is clear is that Google has invested time and effort in figuring out the content of static images and digital video. When Google does move, the impact on the market could be significant due to its near monopolistic control of search and retrieval.
Current examples of what's available
My view is that Google's consumer image search is useful, probably as good as, if not better than, comparable systems from Bing, TinEye and Flickr.
I prefer the image search function of Exalead (a division of Dassault Systemes), which returns relevant images without the malware attracting iFrames used by Google. What few of my colleagues in the field of enterprise search know is that Exalead's system has for several years offered image search features only now becoming available on Google. For example, it automatically recognizes an image suitable for desktop wallpaper and displays a hot link to it. Exalead's portrait or landscape option has been available for a long time, and the company has also pushed ahead in video search.
Autonomy also offers image and video search systems. Other vendors include such companies as OpenText's Nstein unit, which uses technology from Imprezzeo. Nstein employs content-based image retrieval and facial recognition. Its system has been tailored to the needs of those engaged in publishing. The user inputs or identifies a sample image. The system then displays matches. With some clicking, the result set can be narrowed to the image the user requires. Nstein provides a software development kit for the system.
A firm called IQ Engines offers an image recognition system that performs "computer vision search." You upload an image to the system. After a minute of processing, the system either displays matches or reports that the image was not in the database.
Kooaba is a visual recognition startup. The company offers a photo management system for licensees and an iPhone application. The user takes a picture of an object and uploads it to Kooaba. The system then "finds" similar images.
A key point is that these systems are using metadata like the date, time, file type and user generated description of an image. Algorithms create a "fingerprint" for color, shapes and other discernable characteristics. If an image appears in a PowerPoint, the name of the PowerPoint "author" may be attached to the digital object. These systems are not figuring out whether the image is a prize-winning heifer or a Volkswagen Jetta.
Image recognition applications
Confusion about image search, image recognition and image systems is flourishing. One reason is the failure to distinguish between the different applications to which image recognition can be applied.
Certain types of image processing work well, are well understood and have a measureable impact. A good example is the machine vision sector of image recognition. Cognex is one of the leaders in machine vision. The company's products make it possible to process barcodes for inventory control. Its technology can "look at" a stream of manufactured components and "see" those with defects. You may want to check out Orpix Computer Vision, Pattern Recognition Company and Microscan, among others.
Cognex, despite the soft economy, reported record revenue in its first quarter of 2011. The firm seems likely to push beyond $300 million in revenues. One indication of the strength of this company is its cash position. The firm had a war chest in May 2011 or more than $300 million in cash and investment. At a time when traditional enterprise search vendors are struggling to stay afloat or tap investors for additional cash, Cognex is flying high.
There are some important differences between the image recognition needs in markets served by Cognex and the needs for image recognition on the part of marketing, sales and business development people. A Cognex machine vision solution can be focused on a well-defined domain, often with specific attributes or "tells." A defective chip, for example, may emit a different refractive index or have a discernible color variation. The technology to recognize a defect in a production line setting is extremely sophisticated. The return on investment can be calculated. Even at competitive labor rates, machine vision can pay for itself with speed, accuracy and at a lower cost than manual methods.
In marketing and sales, however, the person putting together a slide presentation needs an image of a product (relatively easy to find if there is metadata attached to the available pictures), or an image to show an intangible quality such as vigor (relatively hard even if someone has indexed an in-house image collection). Vendors offering image management systems based on metadata provided by the camera or by a human indexer are available. One can use the InMagic (inmagic.com) system as an image retrieval system. Clever system administrators can make a traditional database like Oracle (oracle.com) or SQL Server (microsoft.com/sqlserver) provide access to images.
But for larger collections of digital images-what used to be called 35-mm slide collections-one needs specialized digital asset management (DAM) systems from such vendors as Adobe, Canto or Microsoft iView, among others. Those systems offer version management, support for different image types such as Adobe Photoshop and PDF, TIFF and vector drawing files. The systems include access controls, essential if an organization is doing work for certain government agencies. They focus on reducing bottlenecks in workflows.
Even with fancy systems, the amount of time required to find a specific image or a specific segment of digital video is indeterminate. Exalead's video search system does allow the user to view a video at the point at which the query matches the content of a digital video.
And what about video?
Video can pose some additional challenges. Digital video is an unwieldy beast with an appetite for storage and a generous side dish of bandwidth. One company that has received accolades from industry groups and analysts is Altus, whose flagship product is vSearch. The company offers on-demand rich media solutions for a range of enterprise applications. The system can be used for knowledge sharing within an organization, a sales enabling service, an educational service or a system to deliver video from a conference with multiple, simultaneous presentations.
Altus has positioned itself as providing a service that "transforms enterprise video into a valuable asset for any organization. vSearch creates a cloud-based learning environment that combines enterprise video with PowerPoint slide synchronization and scrolling transcripts into an accessible video content archive that is searchable down to the spoken word or specific point of interest. Content can be viewed as streaming media or on-demand presentations from any computer, tablet or smart phone-allowing instant access to knowledge anytime or anywhere." The Altus approach is to deliver video search as software as a service (SaaS).
Still, the question that interests me is, "Are these systems from sophisticated technology companies able to look at an image or a frame in the video and ‘figure out' what the picture represents?" The sci-fi version of image recognition is out of reach. The meaning of a picture depends on a context that, at this time, requires a human to discern. For now, humans still have a role to play in finding just the right image for any given situation. We are not about to see the end of that good old-fashioned function called indexing for rich media for a few years.