Henrik Valued Newbie
Joined: 09 Jul 2000 Posts: 35 Location: Copenhagen, Denmark
|
Posted: Sun Mar 10, 2002 4:43 am Post subject: Converting Very Large HTML files to text equivalents |
|
|
Hi guys !
I want to convert very large HTML files (1 - 10 MB) to their text only equivalents.
Due to the list loadfile memory bug in VDS, I tried to accomplish this by using this code:
The script uses blowfish.dll, VDSINET DLL and VDSBIN DLL
Code: |
%%KEY = MYKEY
title WebAnalyzer: Export Report as Text
external @path(%0)export.dll
external @path(%0)html.dll
external @path(%0)loader.dll
option errortrap,error
DIALOG CREATE,WebAnalyzer: Export As Text,100,100,503,47,NOSYS,ONTOP
DIALOG ADD,TEXT,text1,6,4,100,16,Choose files ...
DIALOG ADD,PROGRESS,PROGRESS1,12,134,358,24
DIALOG ADD,TEXT,TEXT2,22,5,,,
DIALOG SHOW
list create,1
list create,2
:START
list clear,1
%%TEXTBUFFER =
%%INFILE = @filedlg("WebAnalyzer Reports (*.wra)|*.wra",Select Report to Save as Text,%1,)
if @not(%%INFILE)
goto close
end
%%filename = @filedlg("All files (*.*)|*.*",Select Output Text file,@path(%%INFILE)@name(%%INFILE).txt,SAVE)
if %%filename
file delete,%%FILENAME
dialog set,text1,Exporting - Step 1 ...
%%TEMP = @Blowfish(DecryptFile,%%Key,%%INFILE,c:\@name(%%FILENAME).tmp)
fileio open,c:\@name(%%FILENAME).tmp,RW,denynone
fileio seek,0,start
dialog set,text1,Exporting - Step 2 ...
%%FP = -1
repeat
fileio seek,@succ(%%FP),start
%%READ = @fileio(read,30000)
%%HTML = @FILEIO(HEX2STRING, %%READ)
%%TEXT = @net(html,%%HTML)
%%TEXTBUFFER = %%TEXTBUFFER%%TEXT
%%FP = @sum(%%FP,30000)
rem info %%FP
rem info %%TEXTBUFFER
gosub UPDATE_PROGRESSBAR
dialog set,text2,%%FP / @file(%%INFILE,Z) Bytes
until @greater(%%FP,@file(%%INFILE,Z))
dialog set,text1,Exporting - Step 3 ...
wait 1
rem %%TEXTBUFFER = @net(html,%%TEXTBUFFER)
list add,1,%%TEXTBUFFER
list savefile,1,%%FILENAME
fileio close
dialog set,text1,Exporting - Done !
end
goto close
rem ***
rem *** ERROR HANDLER
rem ***
:error
if @equal(@error(E),901)
warn Error with opening the file. File may be in use!
else
if @equal(@error(E),902)
warn Error with creating the file.
else
if @equal(@error(E),903)
warn Error with reading from the file.
else
if @equal(@error(E),904)
warn Error with writing to the file.
else
warn Unexpected error @error(E)!
end
end
end
end
goto evloop
rem ***
rem *** CLOSE
rem ***
:close
file delete,c:\@name(%%FILENAME).tmp
fileio close
exit
rem ***
rem *** UPDATE PROGRESSBAR - OK
rem ***
:UPDATE_PROGRESSBAR
%%PROGRESS = @format(@fmul(@fdiv(%%FP,@file(%%INFILE,Z)),100),3.0)
dialog set,progress1,%%PROGRESS
exit
|
The problem is that far from the whole file is written as output file ?
Does anyone have any idea what's wrong ?
Maybe a DLL that will take an input HTML file of any size and write its text only text equivalent as output file would be needed for this ?
Thanks !
Henrik _________________ Henrik Skov
Email: henrikskov@mail.dk |
|