In the last years, e-commerce has grown massively and evolved into a main driver of technological innovation on the Web. The Semantic Web is a vision to advance the technological foundation of the Web so that computers are empowered to better extract and process information from Web content. A core principle of the Semantic Web is to augment Web markup by structured data suited for machine processing, instead of markup just suitable for rendering the information for human consumption. The application of the Semantic Web to e-commerce shows significant potential in particular for the e ciency and precision of search, improving data quality, or raising market e ciency. Despite a significant increase in adoption, the percentage of Web sites that provide data markup for e-commerce information is still limited and will likely remain limited for many years to come. Predominantly, the data is generated with shop software extension modules, covering only a small fraction of the Web. At the same time, automatic methods for Web Information Extraction are still not able to reconstruct the full amount of structured data behind Web content. In order to address this issue, we propose a novel method for Web Information Extraction, targeted to the e-commerce domain. The approach exploits (1) the market dominance of a small amount of e-commerce systems, (2) the patterns those systems expose in Web page generation, and (3) the existing structured data in e-commerce. We evaluate our findings by splitting our dataset into a learning set and an evaluation set. Our results show that the approach is feasible for extracting structured data from e- commerce sites that do not include data markup solely on the basis of template similarity and existing markup as training data. The fundamental idea is to combine similarities in Web page templates, caused by the popularity of o -the-shelf shop software, with the use of data markup found in the subset of Web pages as training data for machine learning.
«In the last years, e-commerce has grown massively and evolved into a main driver of technological innovation on the Web. The Semantic Web is a vision to advance the technological foundation of the Web so that computers are empowered to better extract and process information from Web content. A core principle of the Semantic Web is to augment Web markup by structured data suited for machine processing, instead of markup just suitable for rendering the information for human consumption. The applic...
»