domain-centric intelligent automated data extraction menthodolgy.
How do you go about finding a flat in a foreign city? Use a real-estate aggregator; ask around about real-estate agencies; rummage through the local classifieds … Why isn’t this as easy as searching in Google or shopping at Amazon?
Where the object of publishing is text, the web has become the “great equalizer” and anyone can become a publisher; where the objects are real-word or digital goods, however, this has not manifested. Search engines have not been able to provide comprehensive, automated object search. Though everyone can publish a Web site about a flat for rent, finding that offer in a search engine is nearly impossible.
Why is that? Searching for objects is harder than for text: we search for objects by their attributes—the size of a flat, the number of bathrooms, the distance to the next Opera house. To find an object offered on some Web page, we have to recognize that an object occurs on that page, identify that object’s type and the value of its attributes.
Todays technology requires those tasks to be performed by the publisher. Yet it puts all the power in the hand of large aggregators such as Google: They decide who gets aggregated and how each publisher has to describe its objects. Publishers have to follow the object types defined by aggregators and provide the value of the offered objects; worse, they have to do this for many aggregators.