EnterpriseSearchCenter.com Home
  News   Features   White Papers   Research Reports   Web Events   Conferences  
 
RESOURCES FOR EVALUATING ENTERPRISE SEARCH TECHNOLOGIES
December 22, 2010

Table of Contents

One Million and Counting
IBM Completes Netezza Acquisition
MarkLogic Updates Toolkits
Autonomy Launches New Platform
Extending Office
The Web Becomes More Liquid With Hyperwords
EDF chooses cloud-based solution for managing e-mail

One Million and Counting

As you walk up Walton Street in Oxford, England, the road bears slightly to the left, and a large 19th-century building comes into view: the headquarters of the Oxford University Press (OUP). OUP is the largest university press in the world, dating its origins from about 1480. In 1983, I arrived at this building carrying a Texas Silent 700 printer. This used thermal ink technology and had two rubber ears on the top into which a telephone handset could be inserted to link the printer into the public telephone network. It was a beautiful April morning, and carrying this terminal into a circa 1830 building seemed rather inappropriate.

At that time, I was heading up the initial attempts by Reed Publishing to develop electronic publishing products. Reed owned International Computaprint Corp. (ICC), based in Fort Washington, Pa., which specialized in keyboarding and printing directories. We had been working with IBM and the University of Waterloo, Canada, on the New Oxford English Dictionary (NOED) project, which was to create a digital version of the Oxford English Dictionary (OED). The proof of concept was to digitize one of the supplements to the first edition, starting at the letter S. Once the digitization and indexing had been completed, I, together with Hans Nickel, the founder and CEO of ICC, were to demonstrate what we had achieved to the NOED project team, led by Tim Benbow and Edmund Weiner. Many members of the team of lexicographers were skeptical of the project’s value, and there was a mixture of expectation and disinterest around the table.

The OED seeks not only to provide a definitive definition of a word but also the origins of when the word was first used, with examples of subsequent use that may have modified the definition. All these examples were contained on about 4 million slips of paper. We set up a connection with a terminal (at 300 baud) to the computer in Fort Washington. I recall the first question, which came from one of the more skeptical lexicographers, who wanted to know how many words in the OED originated in The Times (London) newspaper. Because all the text had been marked up in Standard Generalized MarkUp Language (SGML; a forerunner of XML), we could identify the source and provide not only a count but also a printout of the results (albeit a very slow task). There was a short period of silence, and then these distinguished scholars realized the potential of information retrieval. They also recognized that it was not going to put them out of a job but enable them to improve the value of the product. A host of queries were proposed, and the session only came to an end when we ran out of supplies of thermal paper.

The NOED project was an enormous success, not only for the OUP but also for the University of Waterloo, as the project team became the Open Text Corp. IBM used the knowledge gained from the project in the development of its search technology. For me, it was also a day of discovery about the power of search to discover new relationships between items of information.

But there were some other lessons to be learned. The first of these was the value of metadata structure in searching. Because of the way the individual elements of the entries had been marked up in SGML, searching for words that had first been used by, say, Charles Dickens could be efficiently executed. The second lesson was gained in listening to the members of the project team from IBM and the University of Waterloo as
they talked about the importance of computers being able to understand the structure of sentences, work that would lead to the development of semantic search technologies. (If you want to read more about the OUP and the OED, the Wikipedia entries are excellent. Also read James Gleick’s view on the subject at http://around.com/oed.html.)

Search is all about the meaning of words, and we need to take this into account in developing search technology. There are now more than a million words in the English language. But when I see people searching, I am often intrigued by their lack of knowledge of synonyms, in particular when trying to get the best out of a search application. We’ve been working on this since the early 1960s, and many of the vendors in the EC100 this year have innovative solutions. There is still much to be done by the search industry to make information retrieval success less dependent on the literacy and subject knowledge of the searcher.

Back to Contents...

IBM Completes Netezza Acquisition

IBM completed the acquisition of Massachusetts-based Netezza, a creator of data warehouse appliances with which IBM initiated a definitive acquisition agreement in late September. Netezza's appliances are geared towards analytics, and the two companies had been strategic partners for several years.

According to IBM, the company will be integrated into its Information Management software portfolio. According to a statement from the company, the acquisition is part of IBM's push to invest heavily in analytics capabilities, evidenced by a reported 14% growth in its analytics business in 3Q of 2010.

(www.ibm.com)

Back to Contents...

MarkLogic Updates Toolkits

MarkLogic Corp. released updated toolkits that allow content authors to more easily access, reuse, and repurpose content created using Microsoft Office. The toolkits contain customizable applications for Microsoft Word, Excel, and PowerPoint that enable authors to tag, search, and reuse previously created content. Developers are also able to build their own applications to extend the functions of the Microsoft Office suite. In addition, MarkLogic Server can be searched and updated to increase collaboration and content sharing between different teams within an organization.

(www.marklogic.com)

Back to Contents...

Autonomy Launches New Platform

Autonomy Corp. released the Autonomy Meaning Based Healthcare Platform (MBH), a set of clinical diagnosis and information governance applications designed to help Healthcare Delivery Organizations increase the quality of their care while reducing costs. The primary feature of MBH is Autonomy Auminence, a point-of-care analysis dashboard that is intended to help care providers make higher quality, evidence- and data-based diagnoses.

Autonomy is a creator of infrastructure software for enterprise use. Its products are designed for a wide array of uses, including customer interaction solutions, information governance, and records management. Its customers include AOL, Lloyds Banking Group, and T-Mobile.

(www.autonomy.com)

Back to Contents...

Extending Office

MarkLogic has released a new set of toolkits and connectors that contain customizable applications for the Microsoft Office suite, enabling authors to tag, search and reuse previously created content.

Developers can then build rich applications for Microsoft Office to extend its functionality. Once a document has been created, it is stored within MarkLogic Server and can be searched and updated to increase collaboration and content sharing between different teams within an organization.

MarkLogic toolkits are available for:

  • Microsoft Word—allows for intelligent information authoring and dynamic assembly for reuse when creating new content;
  • Microsoft Excel—search across spreadsheets and workbooks for text, formulas and metadata to improve information reuse and discovery; and
  • Microsoft PowerPoint—create new custom presentations by searching and retrieving information that already exists in your library of presentations, documents and spreadsheets.

Back to Contents...

The Web Becomes More Liquid With Hyperwords

The Hyperwords Company announced a major update to the Hyperwords Firefox Add-On Version 7. This update allows users to select any word or words on a web page and choose a command: search Wikipedia or Google (or any other source), find pictures, maps, products and live share prices, share via email, blogs, Twitter or Facebook, translate, or convert currencies and other units. The results of searches are presented in a QuickWeb overlay and results of translations and conversions are presented in the page itself through LiquidText technology.

(www.hyperwords.net)

Back to Contents...

EDF chooses cloud-based solution for managing e-mail

The Environmental Defense Fund (EDF) wanted to reduce the complexity and recurring hardware and administration costs associated with e-mail management. The non-profit organization, with 400 scientists and attorneys in 10 different offices, had relied on inefficient systems in the past and decided on a new route--a cloud-based e-mail security system from Mimecast.

EDF’s CIO Brian Attas, says, "The solution greatly simplifies our overall e-mail environment and gives me peace of mind. Any organization that wants to get out of the business of archiving and managing e-mail security should look at Mimecast."

Mimecast reported recently that with its solution, the EDF saw an immediate improvement in overall e-mail performance, anti-virus coverage, policy enforcement and compliance. The latter two are of particular importance to Attas and his team because EDF employees create large volumes of unstructured data quickly.

Attas says, "Mimecast is a tremendous help for record-keeping and e-discovery. If we have to find archived messages, it literally takes less than a minute to set up and execute a search."

EDF users also like the business continuity feature of Mimecast. Attas explains, "In the past, if we had a power or server outage at both of our main hub locations, there would not have been access to e-mail, even from home. Now e-mail fails over to Mimecast in that event, which provides a Web-based interface, which looks just like the user’s inbox, to enable our users to continue working and keep outages invisible to the outside world."

Back to Contents...
 
[Newsletters] [Home]

Problems with this site? Please contact the webmaster. | About ITI | Privacy Policy