Click here to download your free PDF of this Enterprise Search Center exclusive article.
Search technology has been around for more than four decades years. But only in the past ten years, as the WorldWide Web has become an integral part of the technology landscape, has it occupied a prominent place in our work and personal lives. And only in the past four years has search finally become a hot and lucrative area of technology development. Why the delay?
First, search technologies require computing power to sort through massive amounts of text. We are finally at that point, even with our desktop machines.
Second, good search requires language understanding. Language is often complex and ambiguous, so it's no trivial problem to figure out what a document is about. Once computing power was no longer the barrier, simple string matching could be replaced by elegant language analysis and complex matching algorithms. That change is transforming not only today's search engines, but also other software applications.
Third, there was little demand for this degree of ele-tions to find their "stuff." When companies discovered that lost information put them at risk for noncompliance, costly lawsuits, product flaws, or poor decisions, then information access became a priority. Fear can induce new interest in technologies that were once quiet backwaters.
Fourth, the web is vast, and search is its entry point. The web has made search mainstream.
Fifth, and most important, our lives have changed during the past decade. Our work and personal lives have become intertwined. We check our email on our PDAs during our kids' soccer games, interrupt work to cheer a goal, then search the web for local fast-food restaurants after the game—and download a map to get there. At work, we send flowers to our mother for her birthday in the midst of trying to meet a deadline. We use the same devices and expect to use the same tools for all our tasks. Why not? This means that we need tools to support both work and personal tasks, and that they must look the same, even if the requirements for security, search and discovery, and communication are different underneath.
So now we have demand, we have technologies, and we have computing power. Can we get the kind of information access that we need? Not quite yet. But the elements for driving development of better information platforms are finally in place. What is lacking is deep understanding of information interactions, and how to automate them effectively. The first barrier is how to divine meaning in both a query and a document. Understanding language is not so easy. Those of you who have read "Amelia Bedelia" books to your kids know that "waiting for the bread to rise" is quite different from "getting a rise" out of someone. The fact is that language is complex and ambiguous. It is also the foundation for most human interactions. In cyberspace, it is the way most of us interact with a computer.
We need to translate the following steps in a normal human information exchange into a human-computer interaction:
1. Question: I ask you a question: "Can you tell me where Winter is?"
2. Remove ambiguity: Knowing that it is July and that I am driving a car and look lost, you ask, "Do you mean Winter Street or Winter Drive?" You never even think that I might be asking about winter the season.
3. Refine and narrow: "Winter Street, in Weston."
4. Answer, in the context of where we are at the moment: "Three blocks up and on the left."
Now, if we try to translate this simple four-step interaction to querying a search engine, how far do we get? In today's—or actually yesterday's—search technology (what we might call basic, or commodity, search), we are no no farther than step 1: we ask a question, and get back some matches with no further interaction, no removal of ambiguity, and little chance to refine the question. In other words, there is none of the human give and take that we expect when we ask a human a question. Instead, we are reduced to making guesses about how the search engine might have matched our query. This guessing game is challenging, but not very rewarding. And since a search is rarely performed as an end in itself, it wastes time and also retrieves less-than-optimum answers.
Search has become the starting point for many tasks because it is the only way to get at the information that we need. Humans are adaptable, and they will use whatever tools are available, even if they are imperfect. But better, more human-like interaction is crucial as computers become even more pervasive. Finding information is part of many business processes and personal tasks. Buying a product, getting directions or phone numbers, conducting research, and monitoring events all require some sort of search. If we think of search as a beginning, rather than an end in itself, then a basic search box with no further interaction is not enough. Search technologies are the only ones that try to divine the meaning of words. They have the potential to make computers conversational and interactive. To get to that point, though, we will have to add technologies that enable computers to:
1. Understand the meanings of words, phrases, sentences, and documents.
2. Interact intelligently: ask questions to remove ambiguity, without being perceived as stupid.
3. Find the useful clues that determine context, as a person would.
4. Unite all the information to which we need access so that we don't have to repeat the process in multiple applications or on multiple sites.
5. Add understanding for non-language information—images, sounds, and possibly gestures.
In other words, the next incarnation of search has to do more than dump possibly relevant documents—or worse, pointers to documents—onto the desktop.
To improve search, we must mimic how people exchange information. Because people ask terrible questions, we must find a way to use other clues—their gestures, tone of voice, location, and what we know about them—in order to flesh out the intention behind their query. That's the next great challenge for search and discovery tools.
We are well on the way to adding language understanding, the first requirement. Most enterprise search engines have added technologies like categorization; identification of names of people, places, and things (entity extraction); and even sentiment extraction, which determines whether the tone of a document is positive or negative about a specific topic or thing. In the chart, note how, on top of basic search, we have layered technologies for understanding and mining language, images, and data. Each successive technology layer adds features that enable more advanced finding and exploration of information.
Outlook for Search and Retrieval
Not so long ago, content was king. In 2006, context will reign. This change comes from the realization that we will never get people to ask the right questions. Instead, we must seek other clues to what they are seeking. In addition, for text-mining applications such as product early warning, compliance, legal discovery, or sentiment analysis, the surrounding content provides a context for data, or for an otherwise ambiguous statement. Context is also gathered in new access applications from the framework in which a question is asked or from the type of task that is in process. So, geolocation added to search lets a wireless company return contextually relevant lists of movies or restaurants. A user's role added to search indicates the types of information needed. Recent search history or personal information provide additional clues.
Certainly, context is the reason that the advertising revenue for web search has grown so quickly. Context is used in web search in order to return ads that are relevant to the query terms in the search. Expect that new information-access applications will wring out every implicit clue that they can to improve search results.
Some of the other trends we note in the search and retrieval market include:
• Meaning-based computing. IDC believes that eventually these technologies—text mining, text analytics, categorization, speech analysis, and translation—will be embedded in most people-facing applications in order to improve human-computer interaction. This represents a sizable opportunity for vendors in this market.
• Browsing, in addition to searching.Presenting browsable results promotes discovery of the unexpected, and it makes search results instantly understandable. Master data management projects should also consider using categorization technologies to bootstrap mapping among diverse schemas.
• Question answering instead of search to provide automatic technical and customer support. Technical support sites are improved when question answering returns answers instead of just a list of documents that contain the right answer.
• Sentiment extraction for monitoring consumer opinion of products, services and people.
• Spam reduction.
• Email monitoring for compliance and/or for email management and mining. Email is a prime source for finding product ideas or for discovering telltale indicators of corporate misbehavior.
• Compliance monitoring of all text and speech regulations.
• Convergence of database and content technologies. Newer search architectures began offering "BI Lite" starting in 2006. Master data management, a trend still largely confined to the database side of the information divide, will gradually extend to the content side. Content technologies have a long history of information normalization: taxonomies, categorization, and controlled vocabularies all predate computers. Using content technologies to categorize and normalize schemas is a logical step that will unite structured and unstructured information and streamline the master data management process, which is still largely manual.
In the next five years, search will become even more pervasive. Questions and answers are part of most human tasks. These technologies will be embedded in cell phones, cars, gas pumps, home entertainment centers, call centers, and transit systems. Good search today requires more than keyword matching. Now, and in the future, search and discovery tools will be expected to help people locate, explore, search, compare, analyze, cluster, categorize, differentiate, aggregate, synthesize, and sort information of all sorts. That will require more than a search engine—it demands a true information discovery platform.
SUSAN FELDMAN Vice President, Content Technologies Research, IDC.
Click here to download your free PDF of this Enterprise Search Center exclusive article.