View previous topic :: View next topic |
Author |
Message |
bornsoft Contributor
Joined: 19 Feb 2009 Posts: 113 Location: Germany
|
Posted: Wed Oct 13, 2010 3:13 am Post subject: Extract ip-addresses from strings and ip-validity-check |
|
|
Hi,
here is a function i wrote to parse out ip-addresses from mail-headers for spam-checking.
I think it's a good parser example you can easy transpose to parse out other informatins from strings.
Feel free to modify.
Code: |
#
# Function bsIP - by bornSoft ( VDS6 required )
# --------------------------- -----------------
#
# Usage: %x = @bsIP( <String> )
#
# %x = @bsIP( Extract , <String> [ , <Number>/ALL/UNIQUE ] )
#
# This function returns "1" if <String> is an IP-formatted number or NULL if not.
#
# If argument EXTRACT is given, the function retuns the first ip-formatted number
# found in <Sring> or Null.
#
# The third argument is optional.
#
# If it is an integer number, for example 2, the function returns the second found
# ip-formatted number. If <Number> is greater then the count, NULL is returned.
#
# If the third argument is set to ALL, the function returns all found ip-formatted
# numbers separated by the current field separator. Doubles are also returned.
#
# If the third argument is set to UNIQUE, the function returns all unique found
# ip-formatted numbers in ascending order separated by the current field separator.
#
#
#DEFINE Function,bsIP
# Subfunction
#DEFINE Function,bsIPCheck
# A string that contains some ip-addresses
%%String = "Received: from 213.198.55.118 (HELO mail08c.verio.de) (213.198.55.118) and another IP is 127.0.0.1"
# Extract all ip-addresses from the string
info Now we are going to extract all ip-adresses from this string: @tab()@cr()@cr()@tab()@chr(34)%%String@chr(34) @tab()
info Extracted: @cr()@cr()@tab()@bsIP(Extract,%%String,All) @tab()
# Extract the first found ip-address from the string
info And now the first found ... @tab()
info Extracted: @cr()@cr()@tab()@bsIP(Extract,%%String) @tab()
# Extract the third ip-address from the string
info The third one ... @tab()
info Extracted: @cr()@cr()@tab()@bsIP(Extract,%%String,3) @tab()
# Extract only unique ip-addresses from the string
info And now all unique ip-adresses in ascending order ... @tab()
info Extracted: @cr()@cr()@tab()@bsIP(Extract,%%String,UNIQUE) @tab()
# Check ip-address for validity
info "And at least we check if [ 192.168.0,100 ] is a valid ip-address ( Notice there is a comma! )"@tab()
%%IP = "192.168.0,100"
if @bsIP(%%IP)
info %%IP is a valid ip-address. @tab()
else
info %%IP is not a valid ip-address. @tab()
end
exit
# Begin Function bsIP
:bsIP
%1 = @trim(%1)
if %1
if @equal(%1,Extract)
goto bsIPExtract
else
goto bsIPCheck
end
end
exit
:bsIPExtract
%P = @pos(".",%2)
if @greater(%P,0)
%L = @new(list)
# First break the string on spaces
list assign,%L,@strrep(%2," ",@cr(),all)
# Mark all unwanted lines
%C = @count(%L)
%x = 0
repeat
%i = @item(%L,%x)
%P = @pos(".",%i)
if @zero(%P)
list put,%L,#
end
%x = @succ(%x)
until @equal(%x,%c)
# Delete all unwanted lines
list assign,%L,@strrep(@text(%L),#@cr()@lf(),,all)
# Remove unwanted characters
%N = "0123456789."
%c = @count(%L)
repeat
%x =
%i = @item(%L)
while @not(@zero(@len(%i)))
%S = @substr(%i,1)
if @greater(@pos(%S,%N),0)
%x = %x%S
end
%i = @strdel(%i,1)
wend
if @bsIPCheck(%x)
list put,%L,%x
else
list put,%L,#
end
%i = @next(%L)
until @equal(@index(%L),%c)
# Delete all unwanted lines
list assign,%L,@strrep(@text(%L),#@cr()@lf(),,all)
if @text(%L)
if @not(%3)
%R = @item(%L,0)
elsif @numeric(%3)
if @greater(%3,@count(%L))
%R =
else
%R = @item(%L,@pred(%3))
end
elsif @equal(%3,ALL)
%R = @strrep(@trim(@text(%L)),@cr()@lf(),@fsep(),all)
elsif @equal(%3,UNIQUE)
%N = @new(list,sorted)
list assign,%N,%L
%R = @strrep(@trim(@text(%N)),@cr()@lf(),@fsep(),all)
list close,%L
list close,%N
end
end
end
exit %R
:bsIPCheck
if @numeric(@strrep(%1,".","",all))
%F = @fsep()
option Fieldsep,.
parse "%A;%B;%C;%D",%1
option Fieldsep,%F
if @greater(%A,255) @greater(%B,255) @greater(%C,255) @greater(%D,255)
%R =
elsif @greater(0,%A) @greater(0,%B) @greater(0,%C) @greater(0,%D)
%R =
elsif @greater(@len(%A),3) @greater(@len(%B),3) @greater(@len(%C),3) @greater(@len(%D),3)
%R =
else
%R = 1
end
end
exit %R
|
. |
|
Back to top |
|
|
bornsoft Contributor
Joined: 19 Feb 2009 Posts: 113 Location: Germany
|
Posted: Thu Oct 14, 2010 8:11 pm Post subject: |
|
|
.
Here is another version for the one who is interested.
It's completely rewritten and works much faster and more reliable now.
An 80kB html file from the Wikipedia site with IP numbers in it is completely parsed out in 500 milliseconds on my WinXP/SP3 machine.
The code looks a bit more complicated, thats primal for speed-improvement.
On my testings I noticed, that it is very important for speed, in which order the things are done.
My first try took 6,5 seconds for the same file compared to half a second now.
I also noticed, that the built in @STRREP( ,ALL) function in VDS6 is significant slow compared to doing it with REPEAT/UNTIL and @pos() which then again is faster then WHILE/WEND.
Code: |
#
# Function bsIP - by bornSoft - Version 2 ( VDS6 required )
# --------------------------------------- -----------------
#
#
# Usage: %x = @bsIP( <String> )
#
# %x = @bsIP( Extract , <String> [ , <Number>/ALL/UNIQUE ] )
#
# This function returns "1" if <String> is an IP-formatted number or NULL if not.
#
# If argument EXTRACT is given, the function retuns the first ip-formatted number
# found in <Sring> or Null.
#
# The third argument is optional:
#
# If it is an integer number, for example 2, the function returns the second found
# ip-formatted number. If <Number> is greater then the count, NULL is returned.
#
# If the third argument is set to ALL, the function returns all found ip-formatted
# numbers separated by the current field separator. Doubles are also returned.
#
# If the third argument is set to UNIQUE, the function returns all unique found
# ip-formatted numbers in ascending order separated by the current field separator.
#
#
Title Function bsIP Example
#DEFINE Function,bsIP
# Subfunctions
#DEFINE Function,bsIP_Check
#DEFINE Function,bsIP_NumTrim
# SubCommand
#DEFINE Command,bsIP_Replace
# Get a file with some ip-addresses in it
%%Text = At first we need a text with a couple of ip-addresses in it. @tab()@cr()For this we download the Wikipedia article about ip addresses. @tab()@cr()@cr()Press OK to start or CANCEL to exit.@cr()
%%URL = "http://en.wikipedia.org/wiki/IP_address"
%%File = @path(%0)wikipedia.txt
if @not(@file(@path(%0)wikipedia.txt))
if @not(@query(%%Text))
exit
end
DIALOG CREATE,bsIP Example by bornSoft,-1,0,300,133,COLOR FFFFBB,NOTITLE,ONTOP
DIALOG ADD,STYLE,STYLE1,Arial,24,B,,GRAY
DIALOG ADD,TEXT,TEXT1,47,36,,,Downloading ...,,STYLE1
DIALOG SHOW
file copy,%%URL,%%File
if @not(@file(%%File))
Dialog hide
info ERROR: File could not be downloaded. Check internet connection. @tab()
exit
else
Dialog hide
info File successfully downloaded. @tab()
end
end
%L = @new(list)
list loadfile,%L,%%File
%%Text = @text(%L)
list close,%L
# Extract all ip-addresses from the string
info Now we extract all ip-addresses and measure processing time. @tab()
%%StartTime = @datetime()
%%IP = @bsIP(Extract,%%Text,ALL)
%%Duration = @datetime("nn:ss:z",@fsub(@datetime(),%%StartTime))
%%IP = @strrep(%%IP,@fsep(),@cr(),all)
info All IP adresses from %%Url @tab()@cr()@cr()@tab()Filesize:@tab()@tab()@file(%%File,Z) bytes@cr()@tab()Duration:@tab()@tab()%%Duration @tab()@cr()@cr()%%IP
# Extract the first found ip-address
info And now the first found ... @tab()
info The first found is:@tab()@bsIP(Extract,%%Text) @tab()
# Extract the third ip-address from the string
info The fourth one ... @tab()
info The fourth one is:@tab()@bsIP(Extract,%%Text,4) @tab()
# Extract only unique ip-addresses from the string
info And now all unique ip-adresses in ascending order ... @tab()
%%IP = @bsIP(Extract,%%Text,UNIQUE)
info All unique IP adresses from %%Url @tab()@cr()@cr()@strrep(%%IP,@fsep(),@cr(),all)
# Check ip-address for validity
info "And at least we check if [ 192.168.0,100 ] is a valid ip-address ( Notice there is a comma! )"@tab()
%%IP = "192.168.0,100"
if @bsIP(%%IP)
info %%IP is a valid ip-address. @tab()
else
info %%IP is not a valid ip-address. @tab()
end
exit
# Begin Function bsIP
:bsIP
%1 = @trim(%1)
if %1
if @equal(%1,Extract)
goto bsIP_Extract
else
goto bsIP_Check
end
end
exit
:bsIP_Extract
if @greater(@pos(".",%2),0)
%L = @new(list)
%2 = @strrep(%2,@chr(160),@chr(32),ALL)
list assign,%L,%2
# Delete all lines with no dot
gosub bsIP_NoDot
# Delete all lines with no numerics
gosub bsIP_NoNum
# Break on commas
bsIP_Replace %L,",",@cr()@lf()
gosub bsIP_NoDot
gosub bsIP_NoNum
# Break on equals
bsIP_Replace %L,"=",@cr()@lf()
gosub bsIP_NoDot
# Break on <
bsIP_Replace %L,"<",@cr()@lf()
gosub bsIP_NoDot
# Break on >
bsIP_Replace %L,">",@cr()@lf()
gosub bsIP_NoDot
gosub bsIP_NoNum
# Trim items to numbers in the beginning and the end
%c = @count(%L)
%x = 0
%N = "0123456789"
repeat
%i = @item(%L,%x)
%S = @substr(%i,1)
%W = @len(%i)
repeat
%P = @pos(%S,%N)
if @zero(%P)
%i = @strdel(%i,1)
%S = @substr(%i,1)
end
until @greater(%P,0) @equal(%P,%W)
# From the back
if @greater(@len(%i),0)
%S = @strdel(%i,1,-1)
while @zero(@pos(%S,%N))
%i = @substr(%i,1,-1)
%S = @strdel(%i,1,-1)
wend
end
list put,%L,%i
%x = @succ(%x)
until @equal(%x,%c)
# Delete all lines with no dot
gosub bsIP_NoDot
# Delete all lines shorter then 7 chars
%x = 0
%c = @count(%L)
while @less(%x,%c)
%i = @item(%L,%x)
if @less(@len(%i),7)
list put,%L,"#"
end
%x = @succ(%x)
wend
gosub bsIP_Delete
# Break on (
bsIP_Replace %L,"(",@cr()@lf()
gosub bsIP_NoDot
# Break on )
bsIP_Replace %L,")",@cr()@lf()
gosub bsIP_NoDot
gosub bsIP_NoNum
# Break on [
bsIP_Replace %L,"[",@cr()@lf()
gosub bsIP_NoDot
# Break on ]
bsIP_Replace %L,"]",@cr()@lf()
gosub bsIP_NoDot
gosub bsIP_NoNum
# Break on slashes
bsIP_Replace %L,"/",@cr()@lf()
gosub bsIP_NoDot
# Break the string on all spaces
bsIP_Replace %L," ",@cr()@lf()
gosub bsIP_NoDot
# Delete all lines with no numeric chars
gosub bsIP_NoNum
# Process line by line
%%L2 = @new(list)
list assign,%%L2,%L
%x = 0
repeat
%i = @item(%%L2,%x)
list assign,%L,@strrep(%i,".",@cr()@lf(),ALL)
# Trim items to only numeric Characters
# at the beginning and at the end
list assign,%L,@bsIP_NumTrim(@text(%L))
# Restore all Dots
%T = @trim(@text(%L))
%T = @strrep(%T,@cr()@lf(),".",ALL)
list assign,%L,%T@cr()@lf()
# Put this item back to %%L2
list put,%%L2,@item(%L)
%x = @succ(%x)
until @equal(%x,@count(%%L2))
list assign,%L,%%L2
list close,%%L2
# Check validity and mark invalids
%x = 0
repeat
%i = @item(%L,%x)
if @not(@bsIP_Check(%i))
list put,%L,"#"
end
%x = @succ(%x)
until @equal(%x,@count(%L))
# Delete invalid lines
gosub bsIP_Delete
# Check which output was wanted
if @text(%L)
if @not(%3)
%R = @item(%L,0)
elsif @numeric(%3)
if @greater(%3,@count(%L))
%R =
else
%R = @item(%L,@pred(%3))
end
elsif @equal(%3,ALL)
%R = @strrep(@trim(@text(%L)),@cr()@lf(),@fsep(),all)
elsif @equal(%3,UNIQUE)
%N = @new(list,sorted)
list assign,%N,%L
%R = @strrep(@trim(@text(%N)),@cr()@lf(),@fsep(),all)
list close,%L
list close,%N
end
end
end
exit %R
# SubCommand bsIP_Replace ( Faster then @STRREP() )
# %1 = List / %2 = SearchCharacter / %3 = RepaceCharacter
:bsIP_Replace
%x = 0
%c = @count(%1)
while @less(%x,%c)
%i = @item(%1,%x)
%P = @pos(%2,%i)
while @greater(%P,0)
%i = @strdel(%i,%P)
%i = @strins(%i,%P,%3)
%P = @pos(%2,%i)
wend
list put,%1,%i
%x = @succ(%x)
wend
list assign,%1,@text(%1)
exit
# Gosub bsIP_Delete
:bsIP_Delete
%x = 0
repeat
%i = @item(%L,%x)
if @equal(%i,"#")
list delete,%L
else
%x = @succ(%x)
end
until @equal(%x,@count(%L))
exit
# Gosub bsIP_NoDot
:bsIP_NoDot
%C = @count(%L)
%x = 0
repeat
%i = @item(%L,%x)
if @zero(@pos(".",%i))
list put,%L,#
end
%x = @succ(%x)
until @equal(%x,%C)
list seek,%L,0
gosub bsIP_Delete
exit
# Gosub bsIP_NoNum
:bsIP_NoNum
%C = @count(%L)
%x = 0
repeat
%i = @item(%L,%x)
%N = 0
%O =
repeat
if @greater(@pos(%N,%i),0)
%O = 1
end
%N = @succ(%N)
until %O @greater(%N,9)
if @not(%O)
list put,%L,#
end
%x = @succ(%x)
until @equal(%x,%C)
list seek,%L,0
gosub bsIP_Delete
exit
# Subfunction bsIP_NumTrim ( %1 = String )
:bsIP_NumTrim
%L = @new(list)
list assign,%L,%1
%C = @count(%L)
%x = 0
%N = "0123456789"
repeat
%i = @item(%L,%x)
%S = @substr(%i,1)
%W = @len(%i)
%P = 1
while @both(@zero(@pos(%S,%N)),@not(@greater(%P,%W)))
%i = @strdel(%i,1)
%S = @substr(%i,1)
%P = @succ(%P)
wend
%P = 1
%S = @substr(%i,%P)
%R =
while @not(@zero(@pos(%S,%N)))
%R = @substr(%i,1,%P)
%P = @succ(%P)
%S = @substr(%i,%P)
wend
if %R
%E = %E%R#
end
%x = @succ(%x)
until @equal(%x,@count(%L))
list close,%L
%R = @strrep(%E,"#",@cr()@lf(),ALL)
exit %R
# Subfunction bsIP_Check
:bsIP_Check
if @numeric(@strrep(%1,".","",all))
%F = @fsep()
option Fieldsep,.
parse "%A;%B;%C;%D;%E",%1
option Fieldsep,%F
if @not(%E)
if @greater(%A,255) @greater(%B,255) @greater(%C,255) @greater(%D,255)
%R =
elsif @greater(0,%A) @greater(0,%B) @greater(0,%C) @greater(0,%D)
%R =
elsif @greater(@len(%A),3) @greater(@len(%B),3) @greater(@len(%C),3) @greater(@len(%D),3)
%R =
else
%R = 1
end
end
end
exit %R
|
. |
|
Back to top |
|
|
DaveR Valued Contributor
Joined: 03 Sep 2005 Posts: 413 Location: Australia
|
Posted: Sun Nov 21, 2010 6:52 am Post subject: |
|
|
bornsoft wrote: | .
Here is another version for the one who is interested.
It's completely rewritten and works much faster and more reliable now.
An 80kB html file from the Wikipedia site with IP numbers in it is completely parsed out in 500 milliseconds on my WinXP/SP3 machine.
The code looks a bit more complicated, thats primal for speed-improvement.
On my testings I noticed, that it is very important for speed, in which order the things are done.
My first try took 6,5 seconds for the same file compared to half a second now.
I also noticed, that the built in @STRREP( ,ALL) function in VDS6 is significant slow compared to doing it with REPEAT/UNTIL and @pos() which then again is faster then WHILE/WEND. |
Hi bornsoft,
Thanks for the speed tips. _________________ cheers
Dave |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|