Asp Forum - Mime filter - microsoft.public.vb.general.discussion

Mayayana

7/13/2011 4:25:00 PM

Does anyone have experience with protocol handlers
and mime filters? The former is registered to handle
protocols in IE. (http, ftp, file, etc.) The latter handles
mime types.

I've written a basic mime filter registered for "text/html",
using the MS C++ sample mimefilt project for guidance.
It works fine but is usually not getting called for the
http: protocol. It's fine offline (file: protocol) but not
online. The filter is just not called at all, regardless of
whether a page is cached. (IInternetProtocol.Start is
not getting called.) I just can't seem to find any kind
of information about such a problem, and my Registry
shows no indication that anything other than the default
urlmon.dll protocol handlers are operating for both file:
and http:. I seem to be overlooking something. IE
security? Some other difference between http: and
file:? (A check of the loaded document's document.mimeType
property returns "HTML Document", so it seems the mime
type of the tested online pages must be text/html.)

22 Answers

Mayayana

7/15/2011 2:29:00 PM

I have a question here for the C++ experts. My mime filter
is working but something has me stumped. I'm wrapping
IInternetProtocol. There are two methods like so:

IInternetProtocol_Start(ByVal szUrl As Long, ByVal pOIProtSink As
olelib2.IInternetProtocolSink, ByVal pOIBindInfo As
olelib.IInternetBindInfo, ByVal grfPI As olelib.PI_FLAGS, dwReserved As
olelib.PROTOCOLFILTERDATA)

IInternetProtocol_Read(ByVal pv As Long, ByVal cb As Long, pcbRead As Long)

They both return an HResult. Start sends in some pointers
that allow me to prepare to filter a file. I can accept or pass
on it. Read is called to get my filtered data. Eduardo Morcillo's
typelib for this declares all of the functions as subs. That seems
to be required. Implements won't work if they're declared as
functions.

I'm redirecting using ReplaceVTableEntry in order
to return a result. The Read method seems to work
fine. But Start is peculiar. There are two issues:

1) If I redirect it works fine offline, for all protocols but
http, but it doesn't get called when online! Yet it works
fine online if I don't redirect.

2) No matter what I do, urlmon doesn't seem to be getting
my return from Start, even though Read is working fine.

I need to sometimes return values telling urlmon to bypass
my filter for particular URLs. Err.Raise doesn't seem to work
to return a result. (Unless urlmon is just ignoring my result
as part of some undocumented behavior.)

I'm guessing there's an issue with pointers in classes here,
but I don't get it.

Mayayana

7/15/2011 5:12:00 PM

I think I got this working. Not a problem with the
function calls. Just wrestling with some very poorly
documented functions.

ralph

7/15/2011 7:55:00 PM

On Fri, 15 Jul 2011 10:28:36 -0400, "Mayayana"
<mayayana@invalid.nospam> wrote:

> I have a question here for the C++ experts. My mime filter
>is working but something has me stumped. I'm wrapping
>IInternetProtocol. There are two methods like so:
>
>IInternetProtocol_Start(ByVal szUrl As Long, ByVal pOIProtSink As
>olelib2.IInternetProtocolSink, ByVal pOIBindInfo As
>olelib.IInternetBindInfo, ByVal grfPI As olelib.PI_FLAGS, dwReserved As
>olelib.PROTOCOLFILTERDATA)
>
>IInternetProtocol_Read(ByVal pv As Long, ByVal cb As Long, pcbRead As Long)
>
> They both return an HResult. Start sends in some pointers
>that allow me to prepare to filter a file. I can accept or pass
>on it. Read is called to get my filtered data. Eduardo Morcillo's
>typelib for this declares all of the functions as subs. That seems
>to be required. Implements won't work if they're declared as
>functions.
>

I see you said you got it working, but just in case here is some
background on HResults and VB.

VB, or rather the VBVM (VB Virtual Machine or Runtime), swallows all
HResults, then throws an exception - a VB Error - if the result is
anything but successful.

The HResult itself is something that COM/OLE concerns itself with and
all "COM" interface methods are essentially 'functions' that return a
HResult. What makes the difference to the client whether a method is a
'function' or a 'sub' is whether the signature defines a "return"
parameter or not.

For example:
HRESULT SendAString([in] BSTR pstr);
would be
Sub SendAString( pstr As String)

HRESULT GetAString([out, retval] BSTR* pstr);
would be
Function GetAString( ) As String

hth
-ralph

Mayayana

7/15/2011 8:47:00 PM

|
| I see you said you got it working,

I think so. It comes and goes. :) The "pluggable
protocols" are a very interesting set of options that
I wasn't aware of before, but I've never come across
anything with so little documentation/samples.

I was able to edit the vTable, send the function to
a .bas module, and back into the class. What I
thought was a problem with that seems to actually
be unexpected behavior from urlmon. It seems that
it gets my message declining to handle the file, but
applies that to future files in the same browser
instance, rather than the current file! So once there
was one file I declined (like about:blank), urlmon was
not calling my Start function for the next navigation.

In the course of exploring ReplaceVTableEntry I came
across a flash from the past:

http://forums.devx.com/archive/index.php/t-...

Matthew Curland, Michael Kaplan and others. Near the
end of the thread Matthew Curland announces his
*upcoming* book. :)

Schmidt

7/16/2011 2:27:00 AM

Am 13.07.2011 18:25, schrieb Mayayana:
> Does anyone have experience with protocol handlers
> and mime filters? The former is registered to handle
> protocols in IE. (http, ftp, file, etc.) The latter handles
> mime types.
>
> I've written a basic mime filter registered for "text/html",
> using the MS C++ sample mimefilt project for guidance.
> It works fine but is usually not getting called for the
> http: protocol. It's fine offline (file: protocol) but not
> online. The filter is just not called at all, regardless of
> whether a page is cached. (IInternetProtocol.Start is
> not getting called.) I just can't seem to find any kind
> of information about such a problem, and my Registry
> shows no indication that anything other than the default
> urlmon.dll protocol handlers are operating for both file:
> and http:. I seem to be overlooking something. IE
> security? Some other difference between http: and
> file:? (A check of the loaded document's document.mimeType
> property returns "HTML Document", so it seems the mime
> type of the tested online pages must be text/html.)

May I ask, what you really want to achieve -
and in what context you want to use such filter(s)?
Is all of that used in conjunction with the IE-control -
or with *.hta Apps ... or ...?

Olaf

Mayayana

7/16/2011 3:06:00 AM

| May I ask, what you really want to achieve -
| and in what context you want to use such filter(s)?
| Is all of that used in conjunction with the IE-control -
| or with *.hta Apps ... or ...?
|

It started when I was watching a blind friend try to
negotiate webpages. He has a good screenreader, but
it's not unusual that he has to trudge through 50+ links
in his quest to reach the actual text on the page. (With
many pages the so-called "content" is a miniscule box in
the middle.)

I got thinking about options and came across the
pluggable protocol options: namespace handlers, protocol
handlers, mime filters.

http://msdn.microsoft.com/en-us/library/aa767916%28v=vs....

A mime filter provides a very simple
option to get the page content before it goes to IE. One
sets it to filter a specific mime type. In this case, text/html.
So my idea is that it may be useful to write optional routines
and/or plugins for the filter. For instance, I could remove
all images and all non-local links. I could make the text
readily accessible with the confusing text of links filtered
out. It might even be realistic to filter specific sites. For
instance, a blind person who visits the same site each day
for news might benefit from the removal of a DIV with ID
"header". They might benefit from the removal of almost
everything other than the DIV holding the news, for that
matter.

I haven't really got to the point of looking into options
yet. At this point I'm just trying to iron out the filter
wrinkles. I now have it working fine except for one glitch:
When urlmon has a page to load it calls the filter's Start
method to initialize. I have the option to return
INET_E_USE_DEFAULT_PROTOCOLHANDLER in order to
pass up access. Common sense would dictate that my
response should apply to the current page. Instead it
seems to apply to all pages in the current session, *after*
the current page! That means I have to pass the file
through without treating it, which seems to produce minor
but annoying effects: For instance, I pass on MSHTML and
RES protocols, but then in OE I don't see email content until
I view the second email. Perhaps I'll get that straightened
out. I'm not done with the basic filtering routine that takes
in data and hands it back out.

All in all, despite lack of documentation, a mime filter
turns out to be a fairly simple thing with simple requirements.
A single Registry setting can turn it on or off. So my thought
at this point is to do something like write a browser extension
to enable a config. GUI, and then see how useful the filter
can be. I've never paid much attention to this, as I haven't
used IE online since about 1999. But a lot of people do, a lot
of people are forced to, and the blind have no other options.
At best I'm thinking that a mime filter might be able to help
the blind to clean up webpages for readability, while also
providing the kind of control to others that one now has in Mozilla
browsers by using userContent.css, which allows one to
override any aspect of a webpage at the level of CSS. With
a mime filter it all comes down to simply parsing the webpage
text.

Tony Toews

7/16/2011 10:52:00 PM

On Fri, 15 Jul 2011 23:06:18 -0400, "Mayayana"
<mayayana@invalid.nospam> wrote:

> It started when I was watching a blind friend try to
>negotiate webpages. He has a good screenreader, but
>it's not unusual that he has to trudge through 50+ links
>in his quest to reach the actual text on the page. (With
>many pages the so-called "content" is a miniscule box in
>the middle.)

Oy, that problem hadn't occurred to me. And I can see exactly what
you mean.

I too have a blind friend. But he's a senior and doesn't use the
computer much. Indeed it's a Windows 98 system and he,
understandably, doesn't want to upgrade. But he only runs a few apps
on it and doesn't browse with it.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/ac...
Tony's Microsoft Access Blog - http://msmvps.com/blo...
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeup...

ralph

7/17/2011 6:18:00 PM

On Sun, 17 Jul 2011 22:29:57 -0400, "Mayayana"
<mayayana@invalid.nospam> wrote:

> Is there a way to get at the HRESULT value and
>return it from a Sub? Example: IInternetProtocol,
>as typelibbed by Eduardo Morcillo, contains all subs.
>The C++ version is all functions. I'm not actually sure
>whether I need the function ability, but I'm still finding
>something not quite right when
>I return INET_E_USE_DEFAULT_PROTOCOLHANDLER,
>telling urlmon that I don't want to process the current
>file.
>

(Can't help with using the IInternetProtocol as I've not use it much
myself.)

Once again you are confusing VB's concept of Subs and Functions with
C/C++ view of such things. All procedures in C/C++ are "functions".
VB's Subs and Functions are partly historical and partly because of
how VB works. (VB is stacked-based).
Once upon a time Basic didn't have either. Once program flow was
managed entirely with Gotos. Then it received 'Subroutines' -
procedures you could call without using a Goto. An block of code that
would run and return you to the next line. (No fiddling with the stack
required.) Then it received 'Functions' - procedures that could run as
atomic blocks of code and return a value. (Some fiddling with the
stack was required.)

By the "C++ version" I think you mean that in the IDL or coclass
declaration you are seeing something like this ...

HRESULT method( int param );

so you are assuming it is a "function" as defined by VB. It is not.
VB sees the interface something like this ...

void method( int param);
or rather ...
Sub method( param As Long )

The VB runtime swallows all HResults. You will never see them from
your VB code.

ALL Interface methods are declared as returning a HResult. In C++ you
can retrieve the HResult, or create a header from the typelib
declarations, or <a couple of other obscure ways>, but the final
result is the same - a method that returns void and takes a single
parameter. Eduardo is correct they are all "VB Subs" (or C/C++
functions returning void).

So you can eliminate that particular concern - ignore any idea they
are VB Functions - they are Subs as far as "COM" is concerned.

> In samples I've found, there can be different return
>values under different scenarios:
>
>STDMETHODIMP CXMLMimeFilterPP::Terminate(DWORD dwOptions)
>{
>// irrelevant code here.
> return m_pIncomingProt->Terminate(dwOptions);
>}
>
> In the only comparable code sample I can find, most
>of the IInternetProtocol methods return E_UNEXPECTED
>when not processing a file. So Terminate would return
>E_UNEXPECTED or the result of
>m_pIncomingProt->Terminate(dwOptions);
>
> My method just passes on the call:
>
> m_pIncomingProt.Terminate dwOptions
>
>So I have a Terminate sub where I call the protocol
>handler's Terminate sub, but in C++ they're both functions.
>Even if I overwrite the vTable pointer for my function, I
>can't return the result of my call to m_pIncomingProt.Terminate
>

You never will. The VBVM swallows it.
VB WILL raise an error if it is anything but successful.

It is kind of like you did something like the C++ example above in
VB...

[Warning! Air Code!]
' MyInternetProtocol
Implements IInternetProtocol

Sub Terminate( opts As Long)
On Error Goto ErrorHandler
IInternetProtocol_Terminate opts
Exit Sub
ErrorHandler:
Err.Raise &H800A000& ' bogus value
End Sub

> Can I at least assume that sending a return value by
>overwriting the vtable pointer is getting back to the
>caller...in case I need to return E_UNEXPECTED? Assuming
>that, is there a way to access the HRESULT return from my
>call to m_pIncomingProt.Terminate? I've seen conflicting
>discussions about using Err.Raise for that.
>

Not sure I can see the reason to do that.

-ralph

Mayayana

7/18/2011 2:30:00 AM

Is there a way to get at the HRESULT value and
return it from a Sub? Example: IInternetProtocol,
as typelibbed by Eduardo Morcillo, contains all subs.
The C++ version is all functions. I'm not actually sure
whether I need the function ability, but I'm still finding
something not quite right when
I return INET_E_USE_DEFAULT_PROTOCOLHANDLER,
telling urlmon that I don't want to process the current
file.

In samples I've found, there can be different return
values under different scenarios:

STDMETHODIMP CXMLMimeFilterPP::Terminate(DWORD dwOptions)
{
// irrelevant code here.
return m_pIncomingProt->Terminate(dwOptions);
}

In the only comparable code sample I can find, most
of the IInternetProtocol methods return E_UNEXPECTED
when not processing a file. So Terminate would return
E_UNEXPECTED or the result of
m_pIncomingProt->Terminate(dwOptions);

My method just passes on the call:

m_pIncomingProt.Terminate dwOptions

So I have a Terminate sub where I call the protocol
handler's Terminate sub, but in C++ they're both functions.
Even if I overwrite the vTable pointer for my function, I
can't return the result of my call to m_pIncomingProt.Terminate

Can I at least assume that sending a return value by
overwriting the vtable pointer is getting back to the
caller...in case I need to return E_UNEXPECTED? Assuming
that, is there a way to access the HRESULT return from my
call to m_pIncomingProt.Terminate? I've seen conflicting
discussions about using Err.Raise for that.

As noted above, I'm not even certain whether any of this
is relevant.
The mime filter is working fine, except that urlmon does not
entirely handle it when I reject a file. For instance, my filter
gets called for CHM files. I don't want to handle those. I
reject them based on their non-http URL. The result is that
CHM files work fine, but nothing loads into the right-hand pane
when they open. Ther's no browser window there -- just white.
Once a contents or index item is selected it's fine.
Likewise with OE. The first click on a newsgroup item shows
nothing in the message pane, but it works fine after that.
There seems to be something in IInternetProtocol that I'm
not responding to correctly.

(For anyone following this: If I reject a file I still need to pass
on calls I get to my IInternetProtocolSink implementation. If I
don't then OE/IE/CHM etc. will freeze. So I know that I have
to play a role in file processing, once I've been called, even if
I don't want to process that file. I just haven't figured out what
part of that role I'm missing, and I can't tell whether it might
depend on my returning something from an IInternetProtocol sub.
I could, perhaps, just process all files I receive and not edit
the ones I don't want to handle, but that gets into other issues
of dealing with cache files, etc. I'd rather keep it clean and
only handle webpages in IE.)

ralph

7/18/2011 7:49:00 AM

On Mon, 18 Jul 2011 09:07:45 -0400, "Mayayana"
<mayayana@invalid.nospam> wrote:

>| > Can I at least assume that sending a return value by
>| >overwriting the vtable pointer is getting back to the
>| >caller...in case I need to return E_UNEXPECTED? Assuming
>| >that, is there a way to access the HRESULT return from my
>| >call to m_pIncomingProt.Terminate? I've seen conflicting
>| >discussions about using Err.Raise for that.
>| >
>|
>| Not sure I can see the reason to do that.
>|
>
> The reason is that it lets the caller know when I'm
>not handling the call. Yet you seem to be saying that
>HRESULT is effectively the same as VOID. If so, then
>why are the C++ functions returing values? In the
>Start method, for instance, I have to return a value
>to let the caller know whether I'm going to handle the
>file. And the docs specify that suitable returns:
>
>http://msdn.microsoft.com/en-us/library/aa767863%28v=vs....
>

Then it is me that is confused. I'm not sure when you are using VB and
when you are using C++. If you are using COM Interfaces with VB code,
you will not see HResults in VB client code.

"A nice thing about VB is that it hides all the details of COM from
the developer. A really nasty thing about VB is that it hides all the
details of COM from the developer."

All* methods that are part of a COM Interface return an HResult for
two reasons 1) it is not possible to throw an exception across process
boundries, 2) the COM subsystem can use the HResult to report errors
that the method might run into as the call cross apartments or even
machine boundries - errors the method itself has or can have no clue
about. So in general HResults should be used to report "COM" errors.
If the method needs to produce a return then that should be handled
using a 'retval' parameter. Most designers of ActiveX components that
are expect to support an IDispatch or Dual interfaces respect that.
However, the expedient of simply using the HResult is quite common,
and especially if the component is expected to be used by platform
like C/C++.

Let's compare using a COM Interface with VB to using one in C/C++.
In C/C++ the first thing you need to do is invoke the COM Library
itself - calling CoInitialize(). The second thing you will need to do
is call CoCreateInstance(). Then it will Query the instance to find
the method. Then it will call that method using a ptr/reference to it.
In C/C++ the developer can catch the HResult 'return'.
(Somewhere in that Google example all that is going on in the
templates, etc.)

With VB the developer doesn't have to do that. He simply declares an
object reference then attempts to assign that reference to an instance
of the object. Everything above is done for the developer by the VBVM
(runtime). That's a nice side. A nasty side - you can't receive the
HResult. The VBVM 'swallows' it.

It won't do you any good to try and fake it either. As far as the VBVM
is concerned its client (your VB code) has no need for it, and there
is no access to it. However ...

A nice side - the VBVM will report any non-successful or interesting
HResult values by raising a VB error.

The nasty side - the VBVM often makes up its own mind what constitues
something interesting. <g>

That's why from a practical sense in VB you can and should just ignore
HResults as a *return value*. The only 'return', constituting a VB
Function in a COM Interface, VB will recognize, is a method with a
return (retval or out) parameter. And note in that case, while the
parameter may exist in the methods signature, it will not be shown in
the methods parameter list, but as a return. (As demonstrated by
viewing the typlib.)

As noted above while you can't catch HResults as a 'return', you can
catch them as a VB Error ... uh, sometimes. <g>

In your case simply providing an errorhandler should 'catch' the
errors or information you want. But the VBVM can be a tad flaky about
what it returns - in some cases you will get the full HResult value,
in others you may get his best quess as to a "COM Error" with an
obscure error code parsed out of the HResult ... you just have to
test.

Or in a worse case scenario the COM subsystem might find another error
in the call and simply highjack the HResult for its own purposes -
totally ignoring what if anything the code in the method might have
wanted to return.

[Check out HResult online. You will see it is actually a composite and
not just a simple value.]

Hopefully that finally explains why the code appears different between
VB and C/C++ that is otherwise doing exactly the same thing.

Last piece of advice - get a couple of books on COM. The Wrox series
are very good and since they are out of print can be found second-hand
for practically the cost of shipping. It will save you a lot of time -
every question you have asked is essentially answered in the first few
chapters of a good book.

Well, I lied. One more piece of advice. When working with obscure or
complex Interfaces or libraries, bite the bullet and learn C/C++. You
can always wrap your 'lower-level' code in an ActiveX component for
use with VB.

-ralph
[All* - it is not a requirement that all COM methods return an HResult
but 99% of them do. In fact OLE will not marshal a method if it does
not have an HResult. While we use the terms interchangeably COM and
OLE are different - COM is the protocol, OLE2 is the implementation.]

microsoft.public.vb.general.discussion

Mime filter

Mayayana

Mayayana

Mayayana

ralph

Mayayana

Schmidt

Mayayana

Tony Toews

ralph

Mayayana

ralph

x Login to ForumsZone