Building a search engine
Hi
I have to build a specialised search engine and want to know the best way to process large volumes of random pages from the web. I have no prior SE building knowledge and therefore little idea as how to proceed. Basically I need to index pages in a particular way which seems to be the easy part, I just don’t know how to get the pages off the web in the first place. I have looked on the web and most of the info there refers to intranets or large sites where the location of pages is assumed to be known. A vague and naïve question but hopefully you get he nub of my gist.
|