Friday, February 24, 2012

Full-Text search indexing HTML content

I've been quite busy these days looking up which was the best way to create
a full-text search catalog for a field composed entirely of HTML content, a
nd I found out that the best way to handle it, so it could ignore HTML tags
as the search was performed, was to make the column an Image-type field, and
associate a file type in a separate column.
I developed a small application that read each of the text field entries, co
nverted it to a byte[] variable using the UnicodeEncoding class (and it's Ge
tString and GetBytes methods) and saved the resulting byte array as a binary
file in the new image field. I even tested how the field data would fare in
a physical file, by using a FileStream to write the Byte[] data on several
different records.
So far, so good. It all seemed to be working. The problem is that after I bu
ild up the full-text catalog (using the wizard, and specifying the file type
field related to the "image" field where the HTML file is stored), I get no
resulting records whenever I perform a query using either CONTAINS or FREET
EXT. Such queries worked just fine when the field was a text field instead o
f an image field, so I'm guessing something went wrong either on the data co
nversion, or on the catalog building process.
Anyone has any ideas?On Wed, 23 Nov 2005 16:45:54 -0600, bacusgod wrote:
(snip)
>Anyone has any ideas?
Hi bacusgod,
You might want to try reposting this in the group where the full text
experts hang out: microsoft.public.sqlserver.fulltext
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment