With the development of on line shopping service, more and more web sites appear such as Amazon, Dangdang and so on. In order to satisfy the consumers’ contrasting and choosing the same merchandises in different web sites, we present an extractor of extracting images from the result pages of deep web called AIE. This extractor can also get the images from the surface web sites which have some relations with the records on deep web. Experiments prove that the method mentioned in this paper can truly and effectively extract the images from the result pages of deep web and has a high accuracy for extracting image from surface web.
Nowadays, with network and people’s requirement developing quickly, more online shopping services have appeared on the Web. Online shopping services provide many kinds of very cheap merchandises and no business acreages and time; consumers can buy whatever they want without going out to the store. So online shopping services have been paid attention to when it was born.
If somebody wants to buy some goods he likes, he should compare the price and quality of these goods in different web stores. In the past, people often have to open three or more pages on the web and switch the pages frequently in order to find which store has the goods with the cheapest price and the best quality.
Usually this makes people very fantod. As a result, extractors for many different deep web data sources are emerged. At present, extracting texts from web pages has been intensively studied, but there are few researches on extracting images. Sometimes the texts about the goods are not very clear for customers to decide whether they want to buy them. For example, when customers want to buy a book, only the texts of the book can not satisfy them if they only remind the cover of the book.
Customers will be puzzled if they can know the goods only through the pictures and texts on deep web and the descriptive information is not very clear or even cut both ways. Moreover, the images on deep web often show the best aspects of the goods to the customers, and they will never tell us certain attributes of the goods.
The information on deep web is not enough if customers want to know the goods they want clearly and find out other customers’ attitude of these goods. For the sake of this matter, we should get some texts and images for the customers from surface web.
The contribution of this paper is as follows: We propose an extractor for 1) extracting images from deep web result pages. 2) extracting images from the deep web which has no images on the result pages but has images on the detailed data record pages. 3) extracting images from surface web which has some relations with the data records on deep web. 4) extracting texts from surface web which has some relations with the data records on deep web.