
Tuesday, February 21,
2005
Section: Technology / Page: 1
DNA technique
for search tools
Julia Zhu
Harvard
researchers working on a 'meaningful' approach to Chinese
characters
Two Harvard
University genetics researchers hope to challenge mighty Google
and the mainland's Baidu for China's internet search market
using a software approach originally developed to understand
human genes.
The pair - Gary
Gao and his professor George Church - have launched
Beijing-based start-up YDCTech to develop the software.
The goal is to
replace 'keyword' search tools with more sophisticated
'semantic-based' tools, which YDC claims will yield faster and
more accurate results.
'We hope to
eventually displace Baidu and Google in the mainland market,
perhaps in five years,' said Charles Gao, chief executive at YDC
(and Gary Gao's elder brother).
YDC believes
the problem with keyword searches is that they take words out of
context and cannot deal with other semantic issues, such as
synonyms. This results in long lists of relevant and irrelevant
search results that internet users have to sift through to find
what they need.
The YDC
approach is different: it attempts to understand the Chinese
language in a 'bottom-up' manner. The software treats
ideographic characters as the basic units of the Chinese
language. Using statistical or combinatorial analysis - such as
scanning for how frequently certain characters appear in the
Chinese language, among other patterns - it can understand
vocabulary and, from there, syntax and semantics.
The approach
was used by researchers to understand human genes. The basic
characters of the DNA language are nucleotide bases adenine (A),
thymine (T), guanine (G) and cytosine (C).
YDC said its
search process was especially powerful for ideographic languages
that lacked a 'word boundary', the 'white spaces' that appear in
English.
All this,
however, is merely theoretical.
Baidu already
has an established business and received investment from Google.
YDC is at least a year away from developing a working product,
and is still in the fund-raising stage.
The company has
raised just US$70,000 from friends and family.
'A friend who
is a Chinese restaurant owner in Tennessee invested US$50,000.
That helped us a lot,' Gary Gao said.
Charles Gao
said YDC was in final discussions with three venture capital
companies and aimed to raise up to US$3 million over the next
few months.
Adam
Bornstein of Ymer Capital Partners Asia said it was considering
backing YDC because its technology was 'much, much better than
the search engines we use'.
Nevertheless, he estimated it would take a war chest of as much
as US$20 million to compete with Baidu and Google and establish
a brand.
Mr Bornstein
said YDC should develop a business model that would not pit it
directly against Google and Baidu. 'YDCTech will most likely not
go head to head ... but rather focus its attention on becoming
'best of breed' in specific verticals,' he said.
Instead of
competing for advertising dollars, YDC could license its
technology to major portals or multinationals. A vertical
business model will see it focus on specific segments, such as
business and finance-related news, blogs and health-care
databases.
Without a
working product, YDC does not even register on the radars of
rivals. Baidu marketing director Bi Sheng said: 'There are five
to six companies on the mainland that have announced plans to
develop their own search engine technologies, but so far only
Baidu has its own technology.'
He also does
not see a threat from semantic-based search technology. 'Our
users have never asked for semantic search services before. And
it takes several years for a search engine company to grow up,'
he said.
YDC
nevertheless remained optimistic for semantic-based services,
with a beta launch planned within six months. Gary Gao said MIT
and Stanford were working on similar projects, but YDCTech had a
lead of six to nine months.
Copyright �
2005. South China Morning Post Publishers Ltd. All rights
reserved.
Any redistribution of information contained in this archive
without permission is
strictly prohibited.
|