BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

How to get the "real live" HTML source?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> C++ Builder (Internet Web)
View previous topic :: View next topic  
Author Message
Enrique
Guest





PostPosted: Fri Dec 15, 2006 10:07 pm    Post subject: How to get the "real live" HTML source? Reply with quote



Hello !
I'm having problems to obtain the HTML source from webpages that refresh its
content dynamically by javascript functions.
in example:
http://articulo.mercadolibre.com.ar/MLA-25012974-notebook-hp-nx-6325-con-micro-sempron-3400-y-gtia-real-_JM

If I use this code (which works very well most of the times):
//----------------------------------
AnsiString TInternetExplorer::SourceCodeHTML(TCppWebBrowser* CppWebBrowser1)
{
AnsiString Source = "";

IHTMLDocument2 *htm = NULL; // #include <mshtml.h>


if(CppWebBrowser1->Document&&SUCCEEDED(CppWebBrowser1->Document->QueryInterface(IID_IHTMLDocument2,
(LPVOID*)&htm)))
{
IPersistStreamInit *spPsi = NULL; // ocidl.h


if(SUCCEEDED(htm->QueryInterface(IID_IPersistStreamInit,(LPVOID*)&spPsi)) &&
spPsi)
{
IStream *spStream = NULL; // objidl.h
OleCheck(CreateStreamOnHGlobal(NULL, true, &spStream));
if(spStream)
{
__int64 nSize = 0;
STATSTG ss;
LARGE_INTEGER nMove;
nMove.QuadPart = 0;
OleCheck(spPsi->Save(spStream, true));
OleCheck(spStream->Seek(nMove,STREAM_SEEK_SET,(ULARGE_INTEGER
*)&nSize));
OleCheck(spStream->Stat(&ss, STATFLAG_NONAME));
nSize = ss.cbSize.QuadPart;

Source.SetLength(nSize);
OleCheck(spStream->Read((void *)Source.data(),
nSize,(ULONG*)&nSize));
OleCheck(spStream->Release());
}

spPsi->Release();
}
htm->Release();
}
return Source;
}
//------------------------------------------------------------------------

I can't find the data that I'm seeing in my screen.
My solution (very inefficient) was to wait for the page loads completly,
then select all, copy to clipboard and then "paste" it to an AnsiString...
it works, but it has many problems, for example, I loss the content of the
clipboard, it's very imprecise, requieres a lot of extra code and resources,
etc.
Can somebody post a better solution? (if it's possible... I think yes, that
is, if the TCppWebBrowser can show the content then, the content is in
somewhere...)

Thanks !
enrique.
Back to top
Hans Galema
Guest





PostPosted: Sat Dec 16, 2006 12:35 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote



Enrique wrote:

Quote:
AnsiString TInternetExplorer::SourceCodeHTML(TCppWebBrowser* CppWebBrowser1)

I can't find the data that I'm seeing in my screen.
My solution (very inefficient) was to wait for the page loads completly,
then select all, copy to clipboard and then "paste" it to an AnsiString...

You did not precise what you actually are missing. Please rightclick
on the page and choose 'View source'. The source will be opened in Notepad.

How does what Notepad shows differ from what SourceCodeHTML() gives?

Hans.
Back to top
Michael Harris
Guest





PostPosted: Sat Dec 16, 2006 1:50 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote



"Enrique" wrote in message
Quote:
Hello !
I'm having problems to obtain the HTML source from webpages that refresh
its content dynamically by javascript functions.
in example:
http://articulo.mercadolibre.com.ar/MLA-25012974-notebook-hp-nx-6325-con-micro-sempron-3400-y-gtia-real-_JM

[SNIP]
enrique.


You want the actual script ?
you might create XML HTTP request. parse the source for
~http://www.mercadolibre.com.ar/org-img/jsapi/paramNSMLA.js~
to determine it is a scriipt, fetch the script using XML HTTP or other
stream.
poke it in the relative line from source.

--
Michael
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Sat Dec 16, 2006 2:19 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Enrique" <enridp (AT) yahoo (DOT) com.ar> wrote in message
news:4582e657 (AT) newsgroups (DOT) borland.com...

Quote:
I'm having problems to obtain the HTML source from webpages
that refresh its content dynamically by javascript functions.

You cannot extract the generated HTML in that situation. The only way to
get the generated HTML would be to actually run the Javascript and capture
its output. Which means parsing the original HTML, extracting the
Javascript code, setting up your own COM objects to mimic the Internet
Explorer DOM, and then running everything through the Windows Scripting
Host. In other words, you are looking at a lot of extra work in order to
handle dynamic content.

Quote:
I can't find the data that I'm seeing in my screen.

That is because you are extracting the original raw HTML, not the
javascript-modified HTML.

Quote:
My solution (very inefficient) was to wait for the page loads completly,
then select all, copy to clipboard and then "paste" it to an AnsiString...
it works, but it has many problems, for example, I loss the content of the
clipboard, it's very imprecise, requieres a lot of extra code and
resources,
etc.

You don't need to copy to the clipboard to get the selected text. You can
query it from the DOM directly.


Gambit
Back to top
Enrique
Guest





PostPosted: Sat Dec 16, 2006 4:11 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

Hello Hans, Remy and Michael !
I will try to merge the answers:

[HANS]
"How does what Notepad shows differ from what SourceCodeHTML() gives?"
Nothing at really !
But that's the problem... the HTML source shows the javascript functions
that loads dynamically some data in the webpage, but I need to parse the
data, not the code that generates it.
i.e:
if you view this page:
http://articulo.mercadolibre.com.ar/MLA-25012974-notebook-hp-nx-6325-con-micro-sempron-3400-y-gtia-real-_JM
you will see something like:
"Finaliza en: 42d 3h (26/01/2007 21:03)"
The date refresh loads dynamically, therefore is not in the sourcecode:
//-----------------
<td valign=top width=106 class=bl2n12><img src=/org-img/t.gif width=5
height=1>Finaliza en:</td>
<td valign=top width=264 class=bl2n12><span id="TimeLeft"
style="position:relative;width=170;height=16">
</span>
<ILAYER id=ns4TimeLeft height=15 width=180>
</ILAYER>
<nolayer>
<iframe id=iframeTimeLeft scrolling=no width=0 height=0 framespacing=0
frameborder=0>
</iframe>
</nolayer></td>
//-----------

Internet Explorer shows the same HTML source that my code, and it is not
usefull to me, I need to see in the code the same I see in my screen...
But in Firefox...
If you select the price and right click to "view source of selection", in
Firefox, the resulting HTML code is revealed, so clearly once the
javascript has been executed, the browser has the HTML source of the page
somewhere so it can render the page in the browser
//---------------
<tr><td class="bl2n12" valign="top" width="106"><img src="/org-img/t.gif"
height="1" width="5">Finaliza en:</td>
<td class="bl2n12" valign="top" width="264"><span id="TimeLeft"
style="position: relative;">42d 3h&nbsp;&nbsp;(26/01/2007 21:03)</span>
<ilayer id="ns4TimeLeft" height="15" width="180">
</ilayer>
<nolayer>
<iframe
src="http://articulo.mercadolibre.com.ar/jm/item?act=timeleft&site=MLA&id=25012974"
id="iframeTimeLeft" framespacing="0" frameborder="0" height="0"
scrolling="no" width="0">
</iframe>
</nolayer></td></tr>
//---------------


[REMY]
"You don't need to copy to the clipboard to get the selected text. You can
query it from the DOM directly."

Can you please send me an example code to do that? I can't see it...
Iam doing something like this:
(commented to help you to understand it)
//------------------------------------------
MyCppWebBrowser->NavegateSync(Links[i], 10, /*images*/false,
/*javascript*/true, /*cookies*/false, /*cache*/false); //this method waits
to download all the page

MyCppWebBrowser->SelectAll(); //this method Select all the content
AnsiString Texto = MyCppWebBrowser->CopiarSeleccion(); //copy the
selection and assign it to an Ansistring
MyCppWebBrowser->DesSeleccionar(); //Unselect all...

Datos = Parse(CodigoHTML, Texto, Links[i]); //parse the data...

int cont = 0;
while(!ChequearDatosCompletos(Datos) && cont < 3) //checks if the
dynamic data is loaded, if not... 3 tries maximun
{
EsperarXTiempo(2); //wait 2 seconds...
CodigoHTML = MyCppWebBrowser->CodigoFuenteHTML(); //Get the HTML
source again
MyCppWebBrowser->SeleccionarTodo(); //select all again
Texto = MyCppWebBrowser->CopiarSeleccion(); //copy selection again
MyCppWebBrowser->DesSeleccionar(); //unselect all again...
Datos = Parse(CodigoHTML, Texto, Links[i]); //parse the data
again...
cont++;
}
//------------------------------------------

And the above methods are defined:
SELECT ALL:
//--------------------------------------------------------------------------
void TInternetExplorer::SeleccionarTodo(TCppWebBrowser* pCppWebBrowser)
{
pCppWebBrowser->ExecWB(Shdocvw_tlb::OLECMDID_SELECTALL,
Shdocvw_tlb::OLECMDEXECOPT_DODEFAULT, NULL, NULL);
}
//--------------------------------------------------------------------------

UNSELECT ALL:
//--------------------------------------------------------------------------
void TInternetExplorer::DesSeleccionar(TCppWebBrowser* pCppWebBrowser)
{

//pCppWebBrowser->ExecWB(/*Shdocvw_tlb::*/OLECMDID_UNSELECT,
Shdocvw_tlb::OLECMDEXECOPT_DODEFAULT, NULL, NULL);

IHTMLDocument2 *HTMLDoc = NULL;
if(SUCCEEDED(pCppWebBrowser->Document->QueryInterface(IID_IHTMLDocument2,
(LPVOID*)&HTMLDoc)))
{
VARIANT_BOOL ret;
TOLEBOOL showUI = false;
//HTMLDoc->execCommand(WideString("SelectAll"), showUI, 0, &ret);
VARIANT Flag;
Flag.vt = VT_I4;
Flag.lVal = 0; // noNavHistory
HTMLDoc->execCommand(WideString("Unselect"), showUI, Flag, &ret);
HTMLDoc->Release();
}

}
//--------------------------------------------------------------------------

COPY SELECTION
//--------------------------------------------------------------------------
AnsiString TInternetExplorer::CopiarSeleccion(TCppWebBrowser*
pCppWebBrowser)
{
AnsiString Antes = Clipboard()->AsText;
pCppWebBrowser->ExecWB(Shdocvw_tlb::OLECMDID_COPY,
Shdocvw_tlb::OLECMDEXECOPT_DODEFAULT, NULL, NULL);
AnsiString Ahora = Clipboard()->AsText;
Clipboard()->AsText = Antes;
return Ahora;
}
//--------------------------------------------------------------------------

NAVIGATESYNC (commented to help you understand it)
//---------------------------------------------------------------------------
bool TMyCppWebBrowser::NavegateSync(AnsiString URL, int Tiempo/*Time
inactive, cancel the Navigate*/, bool Imagenes/*Download Images?*/, bool
JavaScript/*JavaScriptActive?*/, bool BorrarCookies/*Delete Cookies?*/, bool
BorrarCache/*Delete Cache?*/)
{
ModificarMaxTiempoInactivo(Tiempo); //modify the max time inactive
(int TIEMPOINACTIVOMAXIMO;)
AsginarEventosVisitas(); //Assign my methods to
the events...
if(Imagenes) { IE.HabilitarImagenes(); } //Enable Images?
else { IE.DeshabilitarImagenes(); }
if(JavaScript) { IE.HabilitarJavascript(); } //Enable Javascript?
else { IE.DeshabilitarJavascript(); }
if(BorrarCookies) { IE.DeleteNewCookies(); } //delete cookies?
if(BorrarCache) { IE.DeleteCache(); } //delete cache?

Timer->Enabled = false;
Timer->Interval = 1000;
Timer->OnTimer = OnTimer;
TiempoInactivo = 0; //InactiveTime
HTMLCompleto = false; //HTMLCompleted (checks that the webpage is
completely loaded)
TMyCppWebBrowser::STOP_ASYNC = false; //STOP_ASYNC is an static
member -> static bool STOP_ASYNC; to let me Stop at any time...
CppWebBrowser->Navigate(WideString(URL));
Timer->Enabled = true;
while(!HTMLCompleto && TiempoInactivo <= TIEMPOINACTIVOMAXIMO &&
!TMyCppWebBrowser::STOP_ASYNC)
{
Application->ProcessMessages();
Sleep(10);
}
if(TMyCppWebBrowser::STOP_ASYNC) //if the while finishes because
STOP_ASYNC is true.
{
CppWebBrowser->Stop();
}
Timer->Enabled = false;
return !TMyCppWebBrowser::STOP_ASYNC;
}
//---------------------------------------------------------------------------


MY EVENTS:
// if some of these events is called, then the navigation it's not inactive:
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnDownloadBegin(TObject *Sender)
{
TiempoInactivo = 0;
}
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnDownloadComplete(TObject *Sender)
{
TiempoInactivo = 0;
}
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnNavigateComplete2(TObject *Sender,
LPDISPATCH pDisp, TVariant *URL)
{
TiempoInactivo = 0;
}
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnProgressChange(TObject *Sender,long
Progress, long ProgressMax)
{
TiempoInactivo = 0;
}
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnTitleChange(TObject *Sender,BSTR URL)
{
TiempoInactivo = 0;
}
//---------------------------------------------------------------------------

//Counts the time that TCppWebBrowser has been without download anything...
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnTimer(TObject *Sender)
{
TiempoInactivo++;
}
//---------------------------------------------------------------------------

Sometimes OnDocumentComplete executes many times, but the main code it's not
downloaded yet. We check that...
//---------------------------------------------------------------------------
void __fastcall TMyCppWebBrowser::OnDocumentComplete(TObject *Sender,
LPDISPATCH pDisp, TVariant *URL)
{
TiempoInactivo = 0;
AnsiString HTML = CodigoFuenteHTML(); //SourceHTML()
HTML = HTML.LowerCase();
if(HTML.Pos("</html>"))
{
HTMLCompleto = true; //HTML completed
}
}
//---------------------------------------------------------------------------

I've asked this in other group, but it was a Delphi group and although they
posted a solution, I can't translate it to Cpp...
Part of tthat conversation:
//----------------------------------
According to MS-SDK:
Code within the SCRIPT block that is not contained within a
function is executed immediately as the page is loaded.

Of course, such code can call other functions in the page
resulting in dynamic DOM modifications. Unless there is
a script that constantly changes the DOM, at some point
the document will be ented a READY state and the document
complete event will fire.

So, if you try to parse after the documentComplete event you
should have no problem. Although I show the behaviour you described
(I am on a slow dial-up and changes occur slowly Wink at the end
I am getting the complete DOM tree.
(...)
But if I view the source
I get what Aaron said. The source is what was downloaded. The DOM is
what produced the page you see in the browser.

are you sure you start parsing after documentComplete? Are you
sure you are using the webBrowser.Document ?

fotis
//----------------------------------
more:
//-----------------------------
sorry, I am not quite happy with C++ Sad
The general guidelines are:

1. _use_ navigate method of webBrowser to get the page you want.
2. wait until the document is completly downloaded. That means
that the only safe point to put your hands on the DOM is in the
event handler of the DocumentComplete event of the WebBrowser.
At this point the document in completly downloaded, the DOM
tree has been created and the page has been rendered in the
web browser window.
Use WebBrowser.Document to get a ref to the document
(is it WebBrowser->Document in C++ ??)

3. start parsing the dom tree starting from it's root node with is
the node that coresponds to the <HTML> tag. use

IHTMLDocument3::documentElement Property

to retrieve a reference to the root node of the document.

then it's a typical tree parsing (a loop to parse the children
and a recursive call to go deeper).

Man! that C++ thing is a nightmare..... ;-)

good luck
//--------------------------------------------

But I'm not very lucky... ^_^
I know is possible, but I really don't know how to do it, if somebody can
translate the above steps to C++Builder code I think we are ready...

greets !
enrique.
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Sat Dec 16, 2006 7:13 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Enrique" <enridp (AT) yahoo (DOT) com.ar> wrote in message
news:45831d53 (AT) newsgroups (DOT) borland.com...

Quote:
the HTML source shows the javascript functions that loads dynamically
some data in the webpage, but I need to parse the data, not the code
that generates it.

Then you have to parse the script code itself. Or else actually run the
script and then grab its output.

Quote:
Internet Explorer shows the same HTML source that my code

That is because you are grabbing the HTML from Internet Explorer to begin
with. That is what TWebBrowser is.

Quote:
I need to see in the code the same I see in my screen...

Then you should use Internet Explorer's DOM interfaces to grab the current
text of the desired TimeLeft field.

Quote:
If you select the price and right click to "view source of selection",
in Firefox, the resulting HTML code is revealed, so clearly once the
javascript has been executed, the browser has the HTML source
of the page somewhere so it can render the page in the browser

No, it doesn't. Firefox is dynamically generating new HTML for the source
viewer when that menu item is clicked, based on the selected text. You
don't have access to that HTML. And it is not the same as the original HTML
that was downloaded.

Quote:
Can you please send me an example code to do that? I can't see it...

The IHTMLDocument2 interface (which you get from the TWebBrowser's Document
property) has a selection property, which is an IHTMLSelection interface.
You can get the text from that. Refer to MSDN for more details.


Gambit
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Sat Dec 16, 2006 7:25 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Remy Lebeau (TeamB)" <no.spam (AT) no (DOT) spam.com> wrote in message
news:4583480c$1 (AT) newsgroups (DOT) borland.com...

Quote:
Then you should use Internet Explorer's DOM interfaces to grab the current
text of the desired TimeLeft field.

For example:

#include <mshtml.h>
#include <utilcls.h>

void __fastcall TMyCppWebBrowser::OnDocumentComplete(TObject *Sender,
LPDISPATCH pDisp, TVariant *URL)
{
if( MyCppWebBrowser->Document )
{
TComInterface<IHTMLDocument2> Doc;
MyCppWebBrowser->Document->QueryInterface(IID_IHTMLDocument2,
(LPVOID*)&Doc);
if( Doc )
{
TComInterface<IHTMLElementCollection> All;
Doc->get_all(&All);
if( All )
{
TVariant ItemName(L"TimeLeft", false);
TVariant ItemIndex = 0;
TComInterface<IDispatch> Disp;

All->item(ItemName, ItemIndex, &Disp);
if( Disp )
{
TComInterface<IHTMLElement> Element;
Disp->QueryInterface(IID_IHTMLElement,
(LPVOID*)&Element)
if( Element )
{
WideString text;
Element->get_innerText(&text);

// use text as needed ...
}
}
}
}
}
}


Gambit
Back to top
Enrique
Guest





PostPosted: Sat Dec 16, 2006 7:04 pm    Post subject: Re: How to get the "real live" HTML source? Reply with quote

It seems OK if you know the name of the desired field. But if you want to
get ALL the "DOM Code", like a new HTML source without Javascript and with
all the fields completed?
I think I need help with the code, these interfaces are my karma...
I've "translated" a Delphi Code that builds a Tree of the DOM, but it does
not work, surely is a bad translation:

//*********//
ORIGINAL (DELPHI)
//*********//
the code below (Delphi) recreates the DOM in a TTreeView.
call it like this:
VisualizeDOMTree(rootElement, nil, treeNodes);

////////////////////////////////////////////////////////////////////////////
/
// element: the IHTMLElement for which we build a subtree
// treeNode: the TreeNode under which we hang the subtree
// treeNodes: the Tree we build
////////////////////////////////////////////////////////////////////////////
/
procedure TCGWTForm.VisualizeDOMTree(element:IHTMLElement;
treeNode:TTreeNode;
treeNodes:TTreeNodes);
var
i: Integer;
myTreeNode: TTreeNode;
curElement: IHTMLElement;
childrenElements: IHTMLElementCollection;
begin
myTreeNode:= treeNodes.AddChild(treeNode, element.tagName);
//get the children collection
childrenElements:= element.children as IHTMLElementCollection;
for i:=0 to childrenElements.length-1 do begin //for each sibling
//get a sidling
curElement:= childrenElements.item(i,0) as IHTMLElement;
//build sibling's subtree
VisualizeDOMTree(curElement, myTreeNode, treeNodes);
end;
end;

//******************************
//******************************

//************//
MY (wrong) TRANSLATION
//************//
void
TInternetExplorer::VisualizeDOMTree(/*TComInterface<IHTMLElement>*/IHTMLElement
element, TTreeNode* treeNode, TTreeNodes* treeNodes)
{
int i
TTreeNode* myTreeNode;
IHTMLElement curElement;
IHTMLElementCollection childrenElements;

myTreeNode = treeNodes->AddChild(treeNode, element.tagName);
//get the children collection
childrenElements = (IHTMLElementCollection)element.children;
for (i=0; i<childrenElements.length; ++i) //for each sibling
{
//get a sidling
curElement = (IHTMLElement)childrenElements.item(i,0);
//build sibling's subtree
VisualizeDOMTree(curElement, myTreeNode, treeNodes);
}
}

//*************************************//

Anyway, this code only builds a Tree of the elements...
And how Fotis (the autor of the delphi code) says:
//-----------------------
"Yes, my code produces only a tree of tags.
you also need tag attributes and text.

Getting all the attributes for each tag
is also possible, its simple (<20 lines), but is
based on some more exotic commands.
You gonna need types like IHTMLDOMNode,
IHTMLAttributeCollection, IHTMLDOMAttribute,
IDispatch, some GUIDs like IID_IHTMLAttributeCollection,
IID_IHTMLDOMNode IID_IHTMLDOMAttribute, etc.

The problem is the text inside, before or after tags
which makes the DOM tree parsing procedure
more complex. Using InnerText does not cover
all cases..... Consider this:

<td>hello <b>world</b> and Enrique</td>

you need getAdjacentText of the IHTMLElement2
interface for "hello" and "and Enrique" parts."
//------------------------

and I have no idea how to do that....
I think that if someone has the code, will be helpfull to put it in the
group, many people could have the same problem (now or in the future), and
clearly it will be a great base of reference for another codes.

Thanks !
enrique.
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Tue Dec 19, 2006 1:39 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Enrique" <enridp (AT) yahoo (DOT) com.ar> wrote in message
news:4583eee0 (AT) newsgroups (DOT) borland.com...

Quote:
It seems OK if you know the name of the desired field. But if you
want to get ALL the "DOM Code"

That is what the "all" collection is for - it contains all of the elements
of the page.

Quote:
like a new HTML source without Javascript and with all the fields
completed?


There is no way to remove the Javascript through the DOM. The only way to
do that is to not let the browser download the original HTML at all.
Download it yourself manually. Then you can alter it any way you wish, and
then load the altered HTML into the browser for displaying.

But like I said earlier, if the Javascript is generating content of its own,
then you have to actually run the Javascript in order for it to generate the
output that you are looking for. Otherwise, you have to parse the script
code manually and then generate your own output.

Quote:
I've "translated" a Delphi Code that builds a Tree of the DOM, but it does
not work, surely is a bad translation:

VisualizeDOMTree(rootElement, NULL, treeNodes);

#include <mshtml.h>
#include <utilcls.h>

void __fastcall TInternetExplorer::VisualizeDOMTree(IHTMLElement
*element, TTreeNode *treeNode, TTreeNodes *treeNodes)
{
if( !elements || !treeNodes )
return;

TTreeNode *myTreeNode;

WideString tagName;
element->get_tagName(&tagName);

if( treeNode )
myTreeNode = treeNodes->AddChild(treeNode, tagName);
else
myTreeNode = treeNodes->Add(NULL, tagName);

// get the children collection
TComInterface<IDispatch> disp;
element->get_children(&disp);
if( disp )
{
TComInterface<IHTMLElementCollection> childrenElements;

disp->QueryInterface(IID_IHTMLElementCollection,
(LPVOID*)&childrenElements);
disp.Unbind();

if( childrenElements )
{
long count = 0;

childrenElements->get_length(&count);
for(long i = 0; i < count; ++i) // for each sibling
{
// get a sidling
TVariant Item = i;
TVariant Index = 0;
childrenElements->item(Item, Index, &disp);
if( disp )
{
TComInterface<IHTMLElement> curElement;

disp->QueryInterface(IID_IHTMLElement,
(LPVOID*)&curElement);
disp.Unbind();

// build sibling's subtree
if( curElement )
VisualizeDOMTree(curElement, myTreeNode,
treeNodes);
}
}
}
}
}


I suggest you read the MSDN documentation about the DOM interfaces before
you go any further:

Interfaces and Scripting Objects

http://msdn.microsoft.com/workshop/browser/mshtml/reference/ifaces/interface.asp


Gambit
Back to top
Enrique
Guest





PostPosted: Tue Dec 19, 2006 9:41 pm    Post subject: Re: How to get the "real live" HTML source? Reply with quote

I really thank your help Gambit, it was very useful.
And yes, I need more practice with these interfaces, but in C++ it seems so
complicated... we can see it just viewing the
translation to C++ of the really simple code in Delphi... why it's so
different? MSDN design its interfaces thinking in Object Pascal?
I will work around your code and MSDN to reach the a version of the
"HTMLRealLiveSource" (at least I will try to do it).

thanks !
enrique.

PD: I have a basic question, because I use it but I don't know what I'm
doing at really...
QueryInterface is like a typecasting?
I mean...

In delhpi we can do this:
//--------------
element:IHTMLElement;
(...)
childrenElements: IHTMLElementCollection;
childrenElements:= element.children as IHTMLElementCollection;
//-----------------
***********************************
But in C++ this doesn't work:
//--------------
IHTMLElement element;
(...)
IHTMLElementCollection childrenElements;
childrenElements = (IHTMLElementCollection)element.children;
//--------------
***********************************
neither:
//--------------
IHTMLElement *element;
(...)
IHTMLElementCollection *childrenElements;
childrenElements = dynamic_cast<IHTMLElementCollection*>(element->children);
//--------------
***********************************
as a matter of fact we must do something like this:
//--------------
IHTMLElement *element;
(...)
TComInterface<IDispatch> disp;
element->get_children(&disp);
if( disp )
{
TComInterface<IHTMLElementCollection> childrenElements;
disp->QueryInterface(IID_IHTMLElementCollection,
(LPVOID*)&childrenElements);
//--------------
************************************
it's really much more abstract (to me) to understand it...
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Wed Dec 20, 2006 12:35 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Enrique" <enridp (AT) yahoo (DOT) com.ar> wrote in message
news:4588092d (AT) newsgroups (DOT) borland.com...

Quote:
And yes, I need more practice with these interfaces, but in C++ it
seems so complicated... we can see it just viewing the translation to
C++ of the really simple code in Delphi... why it's so different?

Because the Delphi language has built-in support for interfaces, so it can
handle reference counting, QueryInterface(), and IDispatch::Invoke()
automatically. The C++ language, however, does not natively have such rich
interface support, so you have to write extra code to manage it all
manually.

Quote:
MSDN design its interfaces thinking in Object Pascal?

No. On the contrary, just about everything Microsoft does is actually
designed for C++ specifically. What I showed you is similar code to what
VC++ programmers would have to use (just using Microsoft wrappers instead of
Borland wrappers). It is just that Borland designed Delphi to work more
integrated with Microsoft technologies. Delphi code may be cleaner, but the
runtime behavior is still the same either way.

Quote:
QueryInterface is like a typecasting?

An interface cannot safely be type-casted from one type to another directly.
QueryInterface() is the only safe way to get one interface type from
another, by actively asking the source interface if it supports a specific
type or not, and if so to then return such a pointer (with proper reference
counting, of course).

Quote:
In delhpi we can do this:

The "as" operator in Delphi is just a wrapper for QueryInterface(). If
QueryInterface() returns an error, the "as" operator throws an exception.

Quote:
IHTMLElement element;

You have to declare those interfaces as pointers:

IHTMLElement *element;
IHTMLElementCollection *childrenElements;

Quote:
childrenElements = (IHTMLElementCollection)element.children;

You cannot type-cast like that. You must use QueryInterface() instead (and
"children" is not a VCL property, so you have to call the get_children()
method instead):

IDispatch *disp;
element->get_children(&disp);
disp->QueryInterface(IID_IHTMLElementCollection,
(LPVOID*)&childrenElements);
disp->Release();
//...
childrenElements->Release();

Quote:
childrenElements =
dynamic_cast<IHTMLElementCollection*>(element->children);


You cannot use dynamic_cast on an interface.


Gambit
Back to top
Enrique
Guest





PostPosted: Fri Dec 29, 2006 12:01 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

Hello again, I was seeing the MSDN intrfaces in more details, but I'm not
sure when I must to use ->Release()
The sample codes in MSDN uses it frequently, but I have many codes that
works fine without Release(), even more, the code sent by Gambit is one of
them.
C++ Builder call Release() automatically?

bye !
enri.
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Fri Dec 29, 2006 12:29 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Enrique" <enridp (AT) yahoo (DOT) com.ar> wrote in message
news:45940667 (AT) newsgroups (DOT) borland.com...

Quote:
Hello again, I was seeing the MSDN intrfaces in more details, but
I'm not sure when I must to use ->Release()

Interfaces are reference counted. When you obtain an interface pointer from
QueryInterface(), AddRef() has been called on it to increment the reference
count. When you are done with the pointer, you must call Release() to
decrement the reference count. If you are working with the raw pointer
directly, then you have to call Release() yourself. Most of the code I have
given you is using smart pointer wrapper classes instead, which call
Release() for you when they go out of scope.

Quote:
The sample codes in MSDN uses it frequently

That is because Microsoft rarely uses smart pointers in its sample code.

Quote:
I have many codes that works fine without Release()

Only if they are using smart pointers. Otherwise, you are leaking memory.

Quote:
even more, the code sent by Gambit is one of them.

I usually use smart pointers in my code.


Gambit
Back to top
Enrique
Guest





PostPosted: Fri Dec 29, 2006 1:34 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

What type of smart pointers? classes created by you or using auto_ptr?
I think I never used smart pointers...

So... should I use this code (for example):
//---------------------------------------------------------------------------------------------
AnsiString TInternetExplorer::CodigoFuenteRealLive(TCppWebBrowser*
MyCppWebBrowser)
{
AnsiString Source = "";
if( MyCppWebBrowser->Document )
{
TComInterface<IHTMLDocument2> Doc;
MyCppWebBrowser->Document->QueryInterface(IID_IHTMLDocument2,
(LPVOID*)&Doc);
if( Doc )
{
TComInterface<IHTMLElementCollection> All;
Doc->get_all(&All);
if( All )
{
TVariant ItemName(L"TimeLeft", false);
TVariant ItemIndex = 0;
TComInterface<IDispatch> Disp;
All->item(ItemName, ItemIndex, &Disp);
if( Disp )
{
TComInterface<IHTMLElement> Element;
Disp->QueryInterface(IID_IHTMLElement,(LPVOID*)&Element);
if( Element )
{
WideString text;
Element->get_innerText(&text);
Source = text;
}
}
}
}
}
return Source;
}
//----------------------------------------------------------------------

or this instead?
(changes marked with /** new **/
//---------------------------------------------------------------------------------------------
AnsiString TInternetExplorer::CodigoFuenteRealLive(TCppWebBrowser*
MyCppWebBrowser)
{
AnsiString Source = "";
if( MyCppWebBrowser->Document )
{
TComInterface<IHTMLDocument2> Doc;
MyCppWebBrowser->Document->QueryInterface(IID_IHTMLDocument2,
(LPVOID*)&Doc);
if( Doc )
{
TComInterface<IHTMLElementCollection> All;
Doc->get_all(&All);
Doc->Release(); /** new **/
if( All )
{
TVariant ItemName(L"TimeLeft", false);
TVariant ItemIndex = 0;
TComInterface<IDispatch> Disp;
All->item(ItemName, ItemIndex, &Disp);
if( Disp )
{
TComInterface<IHTMLElement> Element;
Disp->QueryInterface(IID_IHTMLElement,(LPVOID*)&Element);
Disp->Release(); /** new **/
if( Element )
{
WideString text;
Element->get_innerText(&text);
Source = text;
Element->Release(); /** new **/
}
}
}
}
}
return Source;
}
//----------------------------------------------------------------------

enrique.
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Fri Dec 29, 2006 2:17 am    Post subject: Re: How to get the "real live" HTML source? Reply with quote

"Enrique" <enridp (AT) yahoo (DOT) com.ar> wrote in message
news:45941c1c (AT) newsgroups (DOT) borland.com...

Quote:
What type of smart pointers? classes created by you or using auto_ptr?

auto_ptr is smart pointer for memory that is allocated with the 'new'
operator. It will not work on interfaces.

In the code I gave you, the smart pointer being used is TComInterface, which
is declared in utilcls.h.

Quote:
So... should I use this code (for example):

Yes.

Quote:
or this instead?
(changes marked with /** new **/

No. That code is not using Release() correctly.


Gambit
Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> C++ Builder (Internet Web) All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.