BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Indy 9 HTTP & Read Timeout

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock
View previous topic :: View next topic  
Author Message
Greg
Guest





PostPosted: Tue Jan 20, 2004 9:39 pm    Post subject: Indy 9 HTTP & Read Timeout Reply with quote



I continue to have a problem that I was never able to get resolved by myself
or others through the news groups. I have a D7 application that does a
little bit of web crawling. I use the TidHTTP component to download each
web page. Downloading is done within a loop statement and is supposed to go
until it runs out of URLs. After about 30 pages are downloaded I start
getting Read Timeout errors that never go away. The only way to continue is
to close the application and then re-open it. I've tried using keep-alive,
creating the HTTP component dynamically (freeing it and re-creating it after
each page request), and forcing a disconnect after each request. Nothing
has helped. It seems as though the HTTP component won't release the
connection so the server is preventing new connections.

Anyone have any ideas on this?

Thanks.


Back to top
Chad Z. Hower aka Kudzu
Guest





PostPosted: Tue Jan 20, 2004 9:51 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote



"Greg" <greg_68 (AT) hotmail (DOT) com> wrote in
news:400da021$1 (AT) newsgroups (DOT) borland.com:
Quote:
I've tried using keep-alive, creating the HTTP component dynamically
(freeing it and re-creating it after each page request), and forcing a
disconnect after each request. Nothing has helped. It seems as though
the HTTP component won't release the connection so the server is
preventing new connections.

Dynamicly recreating it will flush all the data out. Do you have ZA or other
proxies installed? I have a web crawler that gets tens of thousands and its
based on Indy.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Need extra help with an Indy problem?

http://www.atozed.com/indy/experts/support.html


ELKNews - Get your free copy at http://www.atozedsoftware.com


Back to top
Greg
Guest





PostPosted: Tue Jan 20, 2004 10:03 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote



Quote:
Dynamicly recreating it will flush all the data out. Do you have ZA or
other
proxies installed? I have a web crawler that gets tens of thousands and
its
based on Indy.

Just a hardware firewall (router). This problem only occurs on some sites
(none of which I'm affiliated with). I can provide a link to a problematic
one if you'd like.



Back to top
Chad Z. Hower aka Kudzu
Guest





PostPosted: Tue Jan 20, 2004 10:15 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

"Greg" <greg_68 (AT) hotmail (DOT) com> wrote in news:400da5a5$1 (AT) newsgroups (DOT) borland.com:
Quote:
Just a hardware firewall (router). This problem only occurs on some sites
(none of which I'm affiliated with). I can provide a link to a problematic
one if you'd like.

It sounds like the sites have a DOS attack preventor or other and are just
blocking you.


--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Want to keep up to date with Indy?

Join Indy News - it free!

http://www.atozed.com/indy/news/


ELKNews - Get your free copy at http://www.atozedsoftware.com


Back to top
Greg
Guest





PostPosted: Wed Jan 21, 2004 9:50 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

"Chad Z. Hower aka Kudzu" <cpub (AT) hower (DOT) org> wrote

Quote:
It sounds like the sites have a DOS attack preventor or other and are just
blocking you.

That doesn't appear to be the reason for one site. I have a server (Windows
2000) and sometimes it crawls it without a problem but other times the web
server needs to be restarted. This happens with a single thread requesting
a page about every second... sometimes two pages a second.

Any other ideas?



Back to top
Martin James
Guest





PostPosted: Thu Jan 22, 2004 4:13 am    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

"Greg" <greg_68 (AT) hotmail (DOT) com> wrote

Quote:
"Chad Z. Hower aka Kudzu" <cpub (AT) hower (DOT) org> wrote in message
news:Xns9477CBFA1737cpub (AT) 127 (DOT) 0.0.1...
It sounds like the sites have a DOS attack preventor or other and are
just
blocking you.

That doesn't appear to be the reason for one site. I have a server
(Windows
2000) and sometimes it crawls it without a problem but other times the web
server needs to be restarted. This happens with a single thread
requesting
a page about every second... sometimes two pages a second.

Strange. It's definitely not TidHTTP - I have an app running that polls six
web servers every second, (intranet). It's been running for four months on
site with no 'incidents'.

It's strange that the *server* needs to be restarted. Do you restart the
server app or reboot the box? I agree with the other posters - sounds like
a dodgy proxy or even the server itself.

Failing that, can you post your HTTP thread code?

Rgds,
Martin





Back to top
Greg
Guest





PostPosted: Thu Jan 22, 2004 4:04 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

"Martin James" <mjames_falcon (AT) dial (DOT) pipex.com> wrote

Quote:

Strange. It's definitely not TidHTTP - I have an app running that polls
six
web servers every second, (intranet). It's been running for four months
on
site with no 'incidents'.

It's strange that the *server* needs to be restarted. Do you restart the
server app or reboot the box? I agree with the other posters - sounds
like
a dodgy proxy or even the server itself.

Failing that, can you post your HTTP thread code?


I can't post the entire code for the thread, but here's the basics of it
with all of the TidHTTP code:

var
http : TidHTTP;
begin
http := TidHTTP.Create(nil);
http.HandleRedirects := TRUE;
http.ReadTimeout := 10000; (tried increasing to more than 60 seconds but
made no difference)
http.Request.BasicAuthentication := TRUE;

(loop start)
...

strRobots := http.Get(strTempURL + 'robots.txt');

...

strHTML := http.Get(strURL);

...

http.DisconnectSocket; (added to try to fix the problem)
...

(loop end)

FreeAndNil(http);
end

The actual server needs restarting. A while back when I first posted about
the problem, one server kept showing new connections for every connection
made but they were never released and that killed that server. It happened
repeatedly.



Back to top
Greg
Guest





PostPosted: Thu Jan 22, 2004 4:35 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

"Greg" <greg_68 (AT) hotmail (DOT) com> wrote

Quote:
I can't post the entire code for the thread, but here's the basics of it
with all of the TidHTTP code:

var
http : TidHTTP;
begin
http := TidHTTP.Create(nil);
http.HandleRedirects := TRUE;
http.ReadTimeout := 10000; (tried increasing to more than 60 seconds but
made no difference)
http.Request.BasicAuthentication := TRUE;

(loop start)
...

strRobots := http.Get(strTempURL + 'robots.txt');

...

strHTML := http.Get(strURL);

...

http.DisconnectSocket; (added to try to fix the problem)
...

(loop end)

FreeAndNil(http);
end

The actual server needs restarting. A while back when I first posted
about
the problem, one server kept showing new connections for every connection
made but they were never released and that killed that server. It
happened
repeatedly.

An example problematic site is http://dmoz.org/. Even with using a single
thread with a 2 second pause between requests it still gives me Read Timeout
errors and that's after retrieving only 6 pages. If I close and re-open the
app it works again until the Read Timeout errors happen.



Back to top
Greg
Guest





PostPosted: Thu Jan 22, 2004 4:41 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote


"Greg" <greg_68 (AT) hotmail (DOT) com> wrote

Quote:
An example problematic site is http://dmoz.org/. Even with using a single
thread with a 2 second pause between requests it still gives me Read
Timeout
errors and that's after retrieving only 6 pages. If I close and re-open
the
app it works again until the Read Timeout errors happen.


Actually, disregard that. The dmoz problem was due to the fact that the
ReadTimeout value was set too low.



Back to top
Greg
Guest





PostPosted: Thu Jan 22, 2004 6:38 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

"Chad Z. Hower aka Kudzu" <cpub (AT) hower (DOT) org> wrote

Quote:
It sounds like the sites have a DOS attack preventor or other and are just
blocking you.

I've been in contact with the system administrator of one of the sites and
it doesn't appear that DOS is the problem. What's happening is after about
3 minutes all pages return the Read Timeout errors (regardless of what the
ReadTimeout value is set to). Once this happens, you can't even view the
web site through the web browser so the web server needs to be restarted.
The web server is Windows 2000 (not sure yet which one, although I know it's
not Professional).

Strangely, this doesn't happen every time, but it does happen the majority
of the time.



Back to top
MattW
Guest





PostPosted: Wed Jan 28, 2004 5:40 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

im having the same problem with a crawler for www.shoutcast.com ... although
it wont even get a single website .. i use this code

var
HTTPResult : TStringStream;

try
HTTP.Head(SearchURL);
HTTP.Get(SearchURL, HTTPResult);
finally
WriteToLog(HTTPResult.DataString); // for debugging sockets
ProcessSearchData(HTTPResult); // custom procedure for splitting the
returned data into tokens / passes the string stream
HTTPResult.Free;
end;

SearchURL contains http://scastlb2.shoutcast.com/directory/?sgenre=Punk ...
basically the program is "suppost" to get shoutcast servers from the webpage
it downloads but even outside of a loop the site gives a readtimeout
instantly ... but the url opens in IE just fine .. yes i made sure the URL
was passed right by calling the WriteToLog procedure right after get which
passes the info to a memo ... not sure how to get this one working <shrug>

ive used TidHTTP alot in the past and never had a problem with this site ..
maybe my code is not doing things the right way?

"Greg" <greg_68 (AT) hotmail (DOT) com> wrote

Quote:
I continue to have a problem that I was never able to get resolved by
myself
or others through the news groups. I have a D7 application that does a
little bit of web crawling. I use the TidHTTP component to download each
web page. Downloading is done within a loop statement and is supposed to
go
until it runs out of URLs. After about 30 pages are downloaded I start
getting Read Timeout errors that never go away. The only way to continue
is
to close the application and then re-open it. I've tried using
keep-alive,
creating the HTTP component dynamically (freeing it and re-creating it
after
each page request), and forcing a disconnect after each request. Nothing
has helped. It seems as though the HTTP component won't release the
connection so the server is preventing new connections.

Anyone have any ideas on this?

Thanks.





Back to top
MattW
Guest





PostPosted: Wed Jan 28, 2004 5:45 pm    Post subject: Re: Indy 9 HTTP & Read Timeout Reply with quote

forgot the --

HTTPResult := TStringStream.Create('');

-- call in my message .. but its in the program so thats not the problem :)

"MattW" <anarchyrising (AT) force-recon (DOT) darktech.or_> wrote

Quote:
im having the same problem with a crawler for www.shoutcast.com ...
although
it wont even get a single website .. i use this code

var
HTTPResult : TStringStream;

try
HTTP.Head(SearchURL);
HTTP.Get(SearchURL, HTTPResult);
finally
WriteToLog(HTTPResult.DataString); // for debugging sockets
ProcessSearchData(HTTPResult); // custom procedure for splitting the
returned data into tokens / passes the string stream
HTTPResult.Free;
end;

SearchURL contains http://scastlb2.shoutcast.com/directory/?sgenre=Punk
....
basically the program is "suppost" to get shoutcast servers from the
webpage
it downloads but even outside of a loop the site gives a readtimeout
instantly ... but the url opens in IE just fine .. yes i made sure the URL
was passed right by calling the WriteToLog procedure right after get which
passes the info to a memo ... not sure how to get this one working <shrug

ive used TidHTTP alot in the past and never had a problem with this site
...
maybe my code is not doing things the right way?

"Greg" news:400da021$1 (AT) newsgroups (DOT) borland.com...
I continue to have a problem that I was never able to get resolved by
myself
or others through the news groups. I have a D7 application that does a
little bit of web crawling. I use the TidHTTP component to download
each
web page. Downloading is done within a loop statement and is supposed
to
go
until it runs out of URLs. After about 30 pages are downloaded I start
getting Read Timeout errors that never go away. The only way to
continue
is
to close the application and then re-open it. I've tried using
keep-alive,
creating the HTTP component dynamically (freeing it and re-creating it
after
each page request), and forcing a disconnect after each request.
Nothing
has helped. It seems as though the HTTP component won't release the
connection so the server is preventing new connections.

Anyone have any ideas on this?

Thanks.







Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.