View previous topic :: View next topic |
Author |
Message |
FreezingFire Admin Team

Joined: 23 Jun 2002 Posts: 3508
|
Posted: Sun Mar 02, 2003 4:05 pm Post subject: [Open Source]: Internet Spider |
|
|
Here's a project which I think would be cool to develop: a link crawler for
indexing pages on the internet. It's not something I will have a lot of use
for but I think it would be interesting to see where we end up.
Here's a bit of code I was working on but didn't know how to finish.
I don't know how to extract the URLs from the pages.
Code: | option scale,96
option fieldsep,"|"
option decimalsep,"."
external vdsipp.dll
%%URL = http://www.vdsworld.com/
REM -- Set the script to run for five minutes --
%%StopTime = @fadd(@datetime(n),5)
INTERNET HTTP,CREATE,1
INTERNET HTTP,THREADS,1,ON
INTERNET HTTP,PROTOCOL,1,1
INTERNET HTTP,USERAGENT,1,VDSWORLD Internet Spider
list create,1
list create,2
:evloop
if @equal(@datetime(n),%%StopTime)
goto close
end
wait event
goto @event()
:BUTTON1BUTTON
dialog disable,button1
dialog disable,edit1
INTERNET HTTP,GETHEADER,1,%%URL
goto evloop
REM -- Just in case we want future header processing in this script --
:HTTP1ONGETHEADERDONE
REM -- Get the page --
INTERNET HTTP,GET,1,%%URL
GOTO EVLOOP
:HTTP1ONGETDONE
list assign,2,@internet(http,content,1)
gosub ProcessPage
goto evloop
:ProcessPage
REM -- I don't know how to extract the URLs from the page --
REM if @match(1,http://)
REM list add,1,@item(1, @match(1,http://))
REM end
EXIT
:CLOSE
rem ** Always destroy the client protocols before exiting
rem your script, to prevent from errors and crashes, also use
rem a STOP incase a download is occuring **
INTERNET HTTP,STOP,1
INTERNET HTTP,DESTROY,1
exit |
_________________ FreezingFire
VDSWORLD.com
Site Admin Team |
|
Back to top |
|
 |
Skit3000 Admin Team

Joined: 11 May 2002 Posts: 2166 Location: The Netherlands
|
Posted: Sun Mar 02, 2003 4:46 pm Post subject: |
|
|
Try something like this to extract the URL...
Code: | if @match(1,http://)
%%link = @item(1)
%%link = @substr(%%link,@pos(http://,%%link),@pos(.htm,%%link))
end |
|
|
Back to top |
|
 |
ShinobiSoft Professional Member


Joined: 06 Nov 2002 Posts: 790 Location: Knoxville, Tn
|
Posted: Sun Mar 02, 2003 6:35 pm Post subject: |
|
|
I would recommend finding the starting and ending <a></a> anchor tags
first and then extracting the URL from the anchorString. _________________ Bill Weckel
ShinobiSoft Software
"The way is known to all, but not all know it." |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You can attach files in this forum You can download files in this forum
|
|