|
RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES |
January 07, 2009 |
|
Table of Contents |
|
Search Is Dead--Now What? |
Google-Watching Service Launched by ArnoldIT.com |
StoredIQ Introduces Expanded eDiscovery Solution |
Exalead Joins HP Business Information and Intelligence Program |
InFocus: Autonomy Corp., PLC |
End-to-end content management |
Micro-messaging goes live |
Google Releases SearchWiki |
NewsGator Releases Related Content Services |
Morningstar Acquires 10-K Wizard |
LinkedIn Releases New Search Platform |
Expert System Introduces COGITO |
Zemanta Opens Semantic API and Announces Commercial Package |
EBSCO Publishing and Lexi-Comp Form Partnership |
EMC Opens Data Warehouse and BI Competency Center |
Concept Searching Announces Discovery Software |
IBM Data Governance Council Leads XBRL Initiative |
|
Search Is Dead--Now What? |
Billions of queries are launched each day—to Google, Yahoo!, third-party systems operating behind an organization’s firewall, desktop search systems available from Coveo, Exalead, IBM, ISYS, Siderean, and maybe four score other firms. There’s also open source software from Lucene … well,the list goes on and on. Entrepreneurs suck hundreds of millions of dollars from venture capital firms to fuel new search competitors. Some are branded with names intended to evoke search and that suggest notions of wisdom (Thetus), or trigger a sense of furious activity (Attensity); others’ branding is merely puzzling (Kanisa, Powerset). Autonomy, Endeca, and Fast Search & Transfer are purpose-built to tame the enterprise information flood. Making counter-arguments that dismiss these are the "big three": IBM, Microsoft, and Oracle, which offer equally comprehensive systems topped off with workflow, analytics, visualization, and enough programming options to delight the most jaded CIO. Standing there like Harvey the invisible rabbit is Google. Although others in the enterprise search market dismiss Google as a floppy parvenu, claiming its approach to enterprise search is too simplistic for enterprise duty, Google continues to make sales and—by sheer gravitational pull—to alter the landscape of enterprise search. Google is poised to bypass search and jump to data spaces and new approaches to relevance such as certainty scores. With the growing need for tools to manage information floods, how can search be dead? There are some reasons that are not always addressed in a straightforward way by vendors and pundits. For example, consider the following: Most users of enterprise search systems are dissatisfied with laundry lists of results. The answer is probably in the list, but the employees lack the time or patience to open and inspect the documents in it. Financial officers are dismayed at the direct and indirect costs enterprise search engines generate. Most organizations have multiple enterprise search systems. Fortune 500 companies are the worst offenders, usually operating five or more systems. Most search vendors promise anything to get the sale, then trust that their engineers can deliver the goods. Keyword search is often a turnoff. Users say that hitting on the right combination of keywords to get the information is too difficult. As many as two-thirds of enterprise search system users report that other means must be used to find needed information. These facts were unearthed in the research for the first three editions of The Enterprise Search Report, which I wrote from 2003 to2006, and for my monograph Beyond Search: What to Do When Your Search System Doesn’t Work (Gilbane Group, Spring 2008). Other analysts, from the Butler Group to Ovum in the U.K., have also identified similar issues.
Navigating Search Search is dead in the sense that keyword search does not deliver the simplicity, results, and cost savings that licensees and users demand. What’s next? Systems have to move beyond search. The technologies needed to improve findability, getting an answer, or locating a specific document are cutting-edge and have only become possible in the last few years. We’re entering treacherous waters now. Established system vendors and newcomers promise silver bullets that will kill the werewolves plaguing enterprise search. Taxonomies resonate in some vendors’ marketing spiels. Others focus on natural language processing. Some stress intelligent systems that discover the meaning of documents using latent semantic indexing. Eager entrepreneur-inventors describe hybrid systems that combine adaptive Bayesian networks, knowledge bases, and human interaction as if the value of subject matter experts were the answer to search problems. Space does not allow me to define, describe, and discriminate among these buzzwords or explain such concepts as federation, clustering, entity extraction, visualization, disambiguation, etc. In 2007, we conducted a search review as part of a search-procurement for America’s leading scientific and technical educational organization. Our key finding was that users want search systems providing single-point access to information inside the organization, web information, and commercial information. The interface, according to our analysis, needs a "search box" and point-and-click navigation options—for example, categories, hotlinks to related information,and one-click access to the most recently added relevant information. Note that the interface is not a portal; it is an interface for information access. Siderean offers this type of portal-like approach to findability using its proprietary technology. Such findings echo what we have learned in our other search-related projects. Keyword search is not enough. Advanced technologies are irrelevant to system users. Users want the search system to go beyond search. They expect a search system to do the following: Offer a web page that gives users specific suggestions and options with hotlinks to topics, categories, and key subjects. Include a search box, but provide the user with point and-click options and a way to get started on the quest for the needed information without requiring the user to frame a keyword query. Allow the user to drill down or jump across topics. The technology should make it easy to explore the available information and—equally important—to backtrack or find a previously displayed piece of information. The vendors are playing one-upmanship. Customers want solutions. If our data are correct, enterprise search is dead unless the search system vendors shift from buzzwords to delivering systems that meet some basic criteria. For example, search systems must be more affordable, be easier to install and maintain, and be more of a finished product and less of a customized "hot rod."
Thumbnail View In a short write-up about search, there is considerable risk in highlighting specific companies. However, in the last year, I have tested several systems that point to the future. Let me provide brief thumbnails of these, with the reader’s understanding that many choices are available for search and advanced text processing. Siderean (www.siderean.com): This company uses a range of technologies that allow the client company to process text, integrate with enterprise software, and provide point-and-click access to content. OracleCorp., one of Siderean’s customers, uses the company’s technology to bring point-and-click exploration and search to content from a wide range of sources. ISYS USA (www.isysusa.com): ISYS provides a solid enterprise search product. The company’s search interface allows users to pinpoint needed information. The results display offers a relevance-ranked list of results plus hotlinks to named people and organizations. In 2007, ISYS added support for rich media content. The system can identify proper names, employee names, and domain terms to be automatically recognized. ISYS’s system delivers unified search across all documents—structured and unstructured. Google (www.google.com/enterprise): Google’s Appliance, available from Dell Computer, is a search "toaster." The system is easily extended with the OneBox API. The Appliance can, in the hands of an enthusiastic developer, be configured to deliver federated search across structured and unstructured content. "While Google has sold over 10,000 search appliances on the premise of a search toaster, the fact remains the majority of those customers have not configured, nor understand, the advanced functionality of which it is capable. This should be a harrowing thought to other search vendors as there is a big knowledge gap that should get filled in the short term along with the pace of innovation at Google," says Erik Arnold of Search Solutions, Inc., a search consulting company. In conclusion, enterprise search relying on keyword queries is dead—or at least will be on life support soon. Systems must sweep technology under the carpet and present users with point-and-click interfaces that include a keyword search option. The buzzwords and jargon do more to confuse the potential licensee than help explain why one system is to be preferred. Today’s search system leaders must take action to deliver results that make users happy.
About the Author STEPHEN ARNOLD (www.arnoldit.com) has spent more than 30 years accessing and developing online technologies, and he is the author of seven books about information technology. He is the original author of "The Enterprise Search Report"; his most recent work includes Google Version 2.0: The Calculating Predator and its predecessor The Google Legacy from Infonortics, Ltd. |
Back to Contents... |
|
Google-Watching Service Launched by ArnoldIT.com |
Stephen Arnold has launched a free service that aggregates the headlines from Google’s own blogs. Overflight, "An ArnoldIT.com Intelligence Service," is an RSS aggregation service that aggregates the headlines from Google’s 74 weblogs. The most recent headlines are grouped using the same categories that Google favors. The methodology grew over a period of time and now includes a number of separate functions combined into one Overflight service. (www.arnoldit.com/overflight) |
Back to Contents... |
|
StoredIQ Introduces Expanded eDiscovery Solution |
StoredIQ Inc., a provider of Intelligent Information Management and eDiscovery technologies, announced that it has expanded its eDiscovery solution to include data stored on backup tapes and other legacy archived media. The StoredIQ Solution for Legacy Archived Media leverages eMag Solutions’ technology. StoredIQ’s appliance automates legal eDiscovery processes and provides visibility and control over information that is largely unstructured and unmanaged. Companies can deploy the appliance in-house to search, preserve, collect, and process data from many live data sources across the enterprise and across remote locations. StoredIQ offers support for data sources, including file and email servers, collaborative software applications, document management repositories, and personal desktops. (www.storediq.com, www.emagsolutions.com) |
Back to Contents... |
|
Exalead Joins HP Business Information and Intelligence Program |
Exalead, a provider of information access software for business and the web, has joined the HP Business Information and Intelligence Program. The HP Business Information and Intelligence Partner Program is a specialized extension of the HP Developer & Solution Partner Program (DSPP). This specialized partner program is designed for Business Information and Intelligence ISVs who are developing solutions on HP Integrity, HP Neoview, or HP BladeSystems. Exalead CloudView is a family of solutions that automatically collect, structure, and contextualize high volumes of unstructured and structured content from a number of data sources scattered across the enterprise. (www.Exalead.com) |
Back to Contents... |
|
InFocus: Autonomy Corp., PLC |
"We were way ahead of the rest of the market, in terms of looking at this broader area of unstructured data," boasts Nicole Eagan, CMO for Autonomy, Corp., speaking of the company’s place in the enterprise search marketplace. In 1996, the company got its start when founder and CEO Michael Lynch, Ph.D., Fellow of the Royal Academy of Engineering, took what he learned about how humans process information while studying at Cambridge University and decided to apply it to computers. So was born what Autonomy calls "meaning-based computing."
To move beyond structured databases and "old-fashioned" computing, Autonomy "came at the market quite differently," Eagan says. "We saw there could be good information in any unstructured data." With this in mind, Eagan says, Autonomy wondered, "What if we just let people index, search, and process it all to get value out of it?"
By "applying very advanced algorithms" to information, Eagan says Autonomy makes "understanding the meaning of human-friendly information" a reality for computers. Autonomy’s approach centers on what it calls "meaning-based computing"—the ability to form an understanding of all information, whether it be structured, semistructured, or unstructured, and recognize the relationships that exist within it.
Focusing on unstructured data meant the U.K.-based Autonomy solution was well-suited to "working with governments and intelligence agencies and what turned into homeland defense," according to Eagan. However, Autonomy saw more applications for its approach. "We took that type of technology and applied it to the corporate world."
It is a commonly referenced statistic that about 80% of a company’s information is unstructured. Autonomy tackled this challenge from the start, with its early focus on turning information that computers could not understand into searchable and more readily usable data. For instance, when you call your bank and hear a warning that your call may be recorded for quality control purposes, it may in fact be scanned by Autonomy’s Intelligent Data Operating Layer (IDOL) Server. The IDOL Server allows users to index data; find what they are looking for; cluster like concepts together automatically; visualize data; group areas of interest together; and profile information—all on a single platform.
Using the call center example, IDOL can search the recorded calls and group them based on any number of concepts, including topic or sentiment. Other companies provide solutions that can search video, voice, or emails, yet Autonomy provides a unified approach that recognizes voice, data, email, instant messaging, text, and video.
In 2005 Autonomy acquired the U.S.-based Verity, a move that added new customers and an American outpost to the mix. Its customer base now includes more than 17,000 global companies, featuring a slew of household names such as BBC, Bloomberg, Boeing, Citigroup, Coca-Cola, DaimlerChrysler AG, Deutsche Bank, Ericsson, Ford, GlaxoSmithKline, NASA, Nestle, the New York Stock Exchange, Reuters, Shell, and some formidable agencies such as the U.S. Department of Energy, the U.S. Department of Homeland Security, and the U.S. Securities and Exchange Commission. More than 350 companies OEM Autonomy technology, including Adobe, Citrix, EDS, HP, Novell, Oracle, Sybase, and TIBCO, and the company has more than 400 VARs and Systems Integrators.
According to Eagan, Autonomy has a market capital of $4 billion. It is the second largest pure software company in Europe and was named the Best Performing Software Company in Europe in 2007.
Focusing on Global 2000 companies with 70% of its business inside of the U.S., Autonomy recognized an opportunity in a court ruling that changed the game for American enterprise. After the changes to the Federal Rules of Civil Procedure (FRCP) regarding electronic discovery went into effect in December 2006, having an effective, efficient means of searching digital content within a company was no longer just a good business idea, it was the law. This regulation bolstered Autonomy’s position because its ability to search across the spectrum of data types helps its customers to better meet discovery requirements that now include things such as images, calendar files, spreadsheets, audio files, websites, and computer programs.
With offices already scattered across the globe, Autonomy has plans for continued growth. As search continues to evolve, there should be no limitations as to where Autonomy and companies like it can go, breaking down barriers between types of data and the way that data is analyzed. Just like in the early days, Autonomy continues to believe its approach will deliver results. As Eagan says, "We think the whole future of computing is going to be based around understanding meaning." ----------- Fun Fact: Autonomy holds "Xbox Tournaments" for employees across geographical boundaries and time zones, in which they play Halo, a game packed with combat. (www.autonomy.com)
[Click Here to see the 2008-09 EContent 100 List.]
|
Back to Contents... |
|
End-to-end content management |
Biopharmaceutical company UCB is using a content supply chain management platform to gain a more detailed view of content—for instance, what is being accessed, by whom and for what reason. The Brussels-based company has launched Infotrieve's Content SCM, a Web-based solution for automatic document sourcing and delivery, with rights management capabilities, copyright compliance auditing and a complete view of content usage enterprisewide. Andrew Clark, group leader of UCB Library Services, says, "Within UCB, teams from Global Library Services, Medical Affairs and IT collectively reviewed and analyzed several products to help us innovate and integrate the information services available to our employees, and Content SCM offered a complete end-to-end solution to meet our global self-service, copyright compliance and administrative needs." According to a news release from Infotrieve, its technology is designed from the point of view of the content user and the administrators responsible for managing, purchasing and ensuring copyright compliance of that content. The company says users can research content using integrated search and discovery platforms with advanced citation referencing capabilities, select and order specific articles, advise on how the content is being used, see the cost of the article and then place an order. The solution also checks copyright compliance rights. |
Back to Contents... |
|
Micro-messaging goes live |
Traction Software has introduced Live Blog micro-messaging for TeamPage 4.0. The new offering enables users to write brief notes from wherever they are and share them instantly over Traction's TeamPage enterprise wiki platform. Traction claims TeamPage 4.0 is the first Enterprise 2.0 suite to incorporate micro-messaging technology that's been made popular by Web services such as Twitter and Pownce. Traction's new Live Blog technology allows enterprise users to: - share brief updates to stay in touch securely with customers and colleagues any time and from anywhere;
- keep in touch with product, sales and support teams while getting work done;
- connect internal teams and development partners, suppliers and customers working on projects that span the globe;
- easily raise and discuss issues that need quick attention;
- use separate Live Blogs to work with individual clients (like a law firm) where privacy is a necessity; and
- leverage TeamPage's security, search, comment, social tagging and notification features to power Live Blogs and flag issues for follow-up action.
Traction explains that a Live Blog is an automatically updating browser window that can be shown directly on a user's desktop or mobile device such as an iPhone. Users type a brief note in the Live Blog window, and everyone with access to that Live Blog sees a new highlighted note in seconds. Live Blog notes are stored in TeamPage's hypertext journal so they can easily be tagged for follow-up; discovered in a TeamPage search; or forwarded as an automatically generated instant message, e-mail or RSS notification. |
Back to Contents... |
|
Google Releases SearchWiki |
Google has released the newest addition to Google Web Search, SearchWiki. With this new set of features, users can add, delete, re-sort, or comment on search results for any query, to create a set of search results that are customized. Users can reorder results so that the site they prefer always appears first, delete a link from the search results that seems out of place, and add a url so their favorite site always shows up for that search. These notes will only affect a users search results pages and do not impact the ranking for other users when they search on Google. (www.google.com) For more information, see ITI's Newsbreak: http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=51802
|
Back to Contents... |
|
NewsGator Releases Related Content Services |
NewsGator has released a set of Related Content services. The New Related Stories and Related Topics widgets post, next to every story, contextually relevant headlines and/or topic key words linking to additional stories. The Related Stories widget offers readers headlines of interest. When they click on a headline, a fly-out window appears with the opening paragraphs of the story. Another click reveals the full article. The Related Topics widget automatically presents contextually related topics to readers as key words. Readers click on the key word of interest and are redirected to a new page of stories relevant to the key word. Publishers who wish to supplement their own content with related stories from third parties can draw on the NewsGator RSS database, a repository of free content. Publishers can control their widgets’ look and feel, search action, and content sources through Editor’s Desk, a NewsGator-hosted platform. (www.newsgator.com) |
Back to Contents... |
|
Morningstar Acquires 10-K Wizard |
Morningstar, Inc., a provider of independent investment research, announced it has acquired 10-K Wizard, a provider of SEC EDGAR (Electronic Data Gathering, Analysis, and Retrieval) filing research, and alert services, for $12.5 million. Available via a subscription service or custom data feed, 10-K Wizard provides global company profiles that contain hyperlinks to annual reports and peer companies as well as stock quotes, news, and charts. The full-text search capabilities allow users to research and track corporate information by ticker symbol, company name, industry type, SIC code, financial form type, or specific industry keywords contained within the filings themselves. (www.morningstar.com/homepage/default.aspx)
|
Back to Contents... |
|
LinkedIn Releases New Search Platform |
LinkedIn, an online network for professionals, announced the release of its new search platform. Users can refine search results by entering data in more than a dozen different fields that range from "name" and "company" to "school" and "language". Users do not need to switch tabs if they want to see results with professionals from outside their network. The new search will retrieve the most relevant professionals from the entire LinkedIn community. New features include: "In Common", a new field in search results that allows users to find what connections and groups you share with the selected user; the ability to save searches and receive reminders by email; two views as part of the search results redesign: basic and expanded; and a type-ahead widget that recommends connections as users type from any people search box. (www.linkedin.com) |
Back to Contents... |
|
Expert System Introduces COGITO |
Expert System, a provider of semantic software that searches, discovers, classifies, and interprets text information, announced the launch of COGITO Answers, a semantic search platform incorporating a Natural Language Interface (NLI) designed for enterprise search and customer self help. With COGITO Answers, enterprise employees can access unstructured information found in databases, manuals, intranets, and other corporate resources to complete their daily tasks. The solution takes into account a companies' unique language (e.g. acronyms and other lingo). COGITO Answers uses its semantic engine to analyze sentences and identify the meaning of the text rather than matching keywords (www.expertsystem.net) |
Back to Contents... |
|
Zemanta Opens Semantic API and Announces Commercial Package |
Zemanta announced the release of Semantic API and front side SDK. Commercial package includes advanced features and SLA. Full power of Zemanta tools for assisted content creation, discovery, and sharing can now be leveraged through the API. Publishers, application developers, and owners of content databases can use it to discover more metadata about content already owned, boost existing content with links to relevant online resources, and create smart tools and apps for end-users or professionals. (www.zemanta.com) |
Back to Contents... |
|
EBSCO Publishing and Lexi-Comp Form Partnership |
EBSCO Publishing and Lexi-Comp have formed a linking partnership, which allows institutions subscribing to DynaMed, or other EBSCO Publishing (EBSCO) medical point-of-care resources to access real-time information from their Lexi-Comp databases. The arrangement allows hospitals and other medical institutions subscribing to DynaMed and Lexi-Comp to launch a search in DynaMed and link to their Lexi-Comp resources using the contextually relevant search information initiated in DynaMed. Contextual real-time linking from DynaMed to resources, such as Lexi-Comp ONLINE, allows linking to current and clinically-relevant drug information including Lexi-Comp’s drug reference, drug interaction, and hospital formulary drug list tools directly from DynaMed. Users are able to search in DynaMed and continue to search in a Lexi-Comp resource without having to begin a new search. Users may also launch new searches in Lexi-Comp or access the main Lexi-Comp ONLINE page from within DynaMed. (www.lexi.com, www.ebscohost.com) |
Back to Contents... |
|
EMC Opens Data Warehouse and BI Competency Center |
EMC Corporation, a provider of information infrastructure solutions, announced the opening of the EMC Data Warehouse/Business Intelligence/Analytics Competency Center. This new Competency Center is focused on delivering application-specific solutions by bringing together EMC engineering resources focused on data warehouse/business intelligence with those of all data warehouse/business intelligence vendors including, Greenplum, IBM, Microsoft, Netezza, Oracle, ParAccel, Sybase, Teradata and Vertica. The Competency Center now enables EMC and all the application vendors to conduct co-engineering activities to leverage their software with the EMC CLARiiON and EMC Symmetrix networked storage systems. (www.emc.com) |
Back to Contents... |
|
Concept Searching Announces Discovery Software |
Concept Searching, developers of concept based search, automatic classification, semantic metadata generation, and taxonomy management software, have announced the availability of PHIdiscovery, a solution that identifies Protected Health Information (PHI) for healthcare entities. PHIdiscovery is a solution that helps healthcare organizations manage the risk associated with PHI that may be located across repositories. The technology identifies content through advanced meta-tagging and automatic classification features. As content is created or ingested, PHI is automatically identified and classified to a folder for security and review procedures. Concept Searching also announced the availability of FOIAdiscovery, a solution that augments current FOIA processes. FOIAdiscovery automatically generates compound term metadata from repositories and classifies it to the agency’s taxonomy. Once classified, when FOIA requests are being processed, the search features deliver the ability to identify concepts as well as keywords from within documents. (www.conceptsearching.com) |
Back to Contents... |
|
IBM Data Governance Council Leads XBRL Initiative |
The IBM Data Governance Council is exploring the use of Extensible Business Reporting Language (XBRL), a software language for describing business terms in financial reports, to risk reporting. The IBM Data Governance Council is seeking input from banks and financial institutions, corporations, vendors, and regulators to create a standards-based approach to risk reporting.
XBRL could be used to provide a non-proprietary way of reporting risk. According to the Council, XBRL Taxonomy of Risk could serve as a fundamental building block to enable interoperability and standard practices in measuring risk worldwide. Such standards could potentially enable Central Banks to manage databases of loss history and trend analyses that could better inform policymakers and member banks helping to minimize risk and produce better returns. (www.ibm.com/think) |
Back to Contents... |
|
[Newsletters]
[Home]
|
|