Friday, February 24, 2012

FullText Search on Multiple Language

Hi All,
I am trying to implement a FullText search for a table which contains
translations for many different languages. However, different languages
utilized different word breaker for the FullText search to work properly for
that language but I can only assign a single word breaker to a table (for
example, traditional chinese uses Chinese(Taiwan) word breaker, simplified
chinese uses Chinese(PRC), etc).
To overcome this problem, I used horizontal partitioning to split the
table into smaller tables according to its various languages; each assigned
with the word breader for that language. And I try to insert, update, delete
the data in all the tables thru a view which union all the smaller tables.
This seems to work well.
Next, I created a storeprocedure to do a full-text search by searching
each smaller table and appending the result.to a temp table
Example
CREATE TABLE #Temp
(
searchresult ntext
)
INSERT INTO #Temp (searchresult)
SELECT translation FROM translation_german WHERE CONTAINS(*, @.searchstring)
INSERT INTO #Temp (searchresult)
SELECT * FROM translation_japanese WHERE CONTAINS(*, @.searchstring)
INSERT INTO #Temp (searchresult)
SELECT * FROM translation_chinese WHERE CONTAINS(*, @.searchstring)
:
The problem is when I do a full-text search for a chinese string in the
table with german word break, it will give the error#7619 ('A clause of the
query contained only ignored words.') and terminate the storeprocedure
immediately. There doesn't seem to be anyway to ignore the error and
continue to the search in the next table(s).
1. Does anyone has anyway to handle the error#7619 in such a way that it
will not terminate the Storeprocedure but continue to search the next table?
OR
2. Is way any other method to implement a full-text search for a table
containing multiple languages (with word breaker correctly implemented)
other then the one described above?
Many thanks in advance
Royston
Royston,
Could you post the full output of -- SELECT @.@.version -- as this would be
most helpful in understanding your environment and providing you with
answers.
First of all, and assuming (for now) that you're using SQL Server 2000, you
do not need to split your table into multiple smaller tables, (one for each
language), as SQL Server 2000 supports multiple collation per column and SQL
Full Text Indexing supports different "Languages for Word Breaker" per
column. Therefore, you can have one column per language and set the
collation and "Language for Word Breaker" at a per column level in one
table. This is one reason I'm requesting the @.@.version info.
In regards to error 7619, when FT Searching for a Chinese string in the
table with German word break, you will need to capture the language before
you issue the FTS query and then run it against the appropriate table (or
column) to avoid this error. Note, that in SQL Server 2005 (currently, still
in beta), you will be able to store multiple languages in one column, and
then issue language specific queries based upon the LCID of the language.
Just curious, why are you running a FT Search query for a Chinese string in
the table with German?
Hope that helps!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Royston" <royston@.earth9.com> wrote in message
news:euxMNcUKFHA.2212@.TK2MSFTNGP12.phx.gbl...
> Hi All,
> I am trying to implement a FullText search for a table which contains
> translations for many different languages. However, different languages
> utilized different word breaker for the FullText search to work properly
for
> that language but I can only assign a single word breaker to a table (for
> example, traditional chinese uses Chinese(Taiwan) word breaker, simplified
> chinese uses Chinese(PRC), etc).
> To overcome this problem, I used horizontal partitioning to split the
> table into smaller tables according to its various languages; each
assigned
> with the word breader for that language. And I try to insert, update,
delete
> the data in all the tables thru a view which union all the smaller tables.
> This seems to work well.
> Next, I created a storeprocedure to do a full-text search by searching
> each smaller table and appending the result.to a temp table
> Example
> CREATE TABLE #Temp
> (
> searchresult ntext
> )
> INSERT INTO #Temp (searchresult)
> SELECT translation FROM translation_german WHERE CONTAINS(*,
@.searchstring)
> INSERT INTO #Temp (searchresult)
> SELECT * FROM translation_japanese WHERE CONTAINS(*, @.searchstring)
> INSERT INTO #Temp (searchresult)
> SELECT * FROM translation_chinese WHERE CONTAINS(*, @.searchstring)
> :
> The problem is when I do a full-text search for a chinese string in
the
> table with german word break, it will give the error#7619 ('A clause of
the
> query contained only ignored words.') and terminate the storeprocedure
> immediately. There doesn't seem to be anyway to ignore the error and
> continue to the search in the next table(s).
> 1. Does anyone has anyway to handle the error#7619 in such a way that
it
> will not terminate the Storeprocedure but continue to search the next
table?
> OR
> 2. Is way any other method to implement a full-text search for a table
> containing multiple languages (with word breaker correctly implemented)
> other then the one described above?
> Many thanks in advance
> Royston
>
>
>
>
|||Hi John,
Yes, I am using SQL Server 2000. I am currently using a single column to
store all the various translations, therefore I need to do a FT search for a
Chinese string in the German table (actually, all the smaller tables). The
logic for insertion, update, and delete is simpler with this method (normal
query thru a view will do). Adding of new language requires only addition of
a new smaller table and a quick update to the view. There's no change to the
existing storeprocedures as the storeprocedures access the translation thru
a view.
Here's my simplified sample schema,
Create Table [English_Table]
(
string_id int primary key,
english_text ntext,
lastupdated_by int,
lastupdated_on datetime,
search_id uniqueidentifier not null Unqiue
)
Create Table [Translation_German_Table]
(
string_id int,
culture varchar(5),
translation ntext,
lastupdated_by int,
lastupdated_on datetime,
search_id uniqueidentifier not null Unqiue
Constraint pk_german Primary Key (string_id., culture)
Check (culture = 'de-DE')
)
Create Table [Translation_Chinese_Table]
(
string_id int,
lang_id varchar(5),
translation ntext,
lastupdated_by int,
lastupdated_on datetime,
search_id uniqueidentifier not null Unqiue
Constraint pk_chinese Primary Key (string_id, culture)
Check (culture = 'zh-CN')
)
Create View [Translation]
(
Select * From [Translation_German_Table]
Union All
Select * From [Translation_Chinese_Table]
)
Storing each language per column will require more complex logic on the
Client side to determine the current language of the translation and to
insert into/ update the appropriate column (addition of new language will
require adding a new column and may also require changes to existing logic).
I will still have to check the language of the search string to implement
the FT search on the correct column (this may pose a problem if the search
string is user input and I don't know what language the user is inputing).
Thanks,
Royston
"John Kane" <jt-kane@.comcast.net> wrote in message
news:OENJA9XKFHA.3336@.TK2MSFTNGP10.phx.gbl...
> Royston,
> Could you post the full output of -- SELECT @.@.version -- as this would be
> most helpful in understanding your environment and providing you with
> answers.
> First of all, and assuming (for now) that you're using SQL Server 2000,
you
> do not need to split your table into multiple smaller tables, (one for
each
> language), as SQL Server 2000 supports multiple collation per column and
SQL
> Full Text Indexing supports different "Languages for Word Breaker" per
> column. Therefore, you can have one column per language and set the
> collation and "Language for Word Breaker" at a per column level in one
> table. This is one reason I'm requesting the @.@.version info.
> In regards to error 7619, when FT Searching for a Chinese string in the
> table with German word break, you will need to capture the language before
> you issue the FTS query and then run it against the appropriate table (or
> column) to avoid this error. Note, that in SQL Server 2005 (currently,
still
> in beta), you will be able to store multiple languages in one column, and
> then issue language specific queries based upon the LCID of the language.
> Just curious, why are you running a FT Search query for a Chinese string
in[vbcol=seagreen]
> the table with German?
> Hope that helps!
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
>
> "Royston" <royston@.earth9.com> wrote in message
> news:euxMNcUKFHA.2212@.TK2MSFTNGP12.phx.gbl...
contains[vbcol=seagreen]
> for
(for[vbcol=seagreen]
simplified[vbcol=seagreen]
the[vbcol=seagreen]
> assigned
> delete
tables.[vbcol=seagreen]
searching[vbcol=seagreen]
> @.searchstring)
> the
> the
> it
> table?
table
>
|||Royston, I am not sure what you suggest would work.
First, I don't think you can do Full text search on view, in other word, do
select * from View1 where contains (data,'some search string')
Second, you would still have to choose ONE word breaker on the indexed data
column, so if you have Germany and Chinese in the same column, What word
breaker can you choose?
I am running the similar problem, in my case, I want to full text search on
chinese and english,but struggling to find a solution for it.
Any suggest are welcome.
--Xin Chen
"Royston" <royston@.earth9.com> wrote in message
news:uspzi$cKFHA.1284@.TK2MSFTNGP14.phx.gbl...
> Hi John,
> Yes, I am using SQL Server 2000. I am currently using a single column
to
> store all the various translations, therefore I need to do a FT search for
a
> Chinese string in the German table (actually, all the smaller tables). The
> logic for insertion, update, and delete is simpler with this method
(normal
> query thru a view will do). Adding of new language requires only addition
of
> a new smaller table and a quick update to the view. There's no change to
the
> existing storeprocedures as the storeprocedures access the translation
thru
> a view.
> Here's my simplified sample schema,
> Create Table [English_Table]
> (
> string_id int primary key,
> english_text ntext,
> lastupdated_by int,
> lastupdated_on datetime,
> search_id uniqueidentifier not null Unqiue
> )
> Create Table [Translation_German_Table]
> (
> string_id int,
> culture varchar(5),
> translation ntext,
> lastupdated_by int,
> lastupdated_on datetime,
> search_id uniqueidentifier not null Unqiue
> Constraint pk_german Primary Key (string_id., culture)
> Check (culture = 'de-DE')
> )
> Create Table [Translation_Chinese_Table]
> (
> string_id int,
> lang_id varchar(5),
> translation ntext,
> lastupdated_by int,
> lastupdated_on datetime,
> search_id uniqueidentifier not null Unqiue
> Constraint pk_chinese Primary Key (string_id, culture)
> Check (culture = 'zh-CN')
> )
> Create View [Translation]
> (
> Select * From [Translation_German_Table]
> Union All
> Select * From [Translation_Chinese_Table]
> )
> Storing each language per column will require more complex logic on
the
> Client side to determine the current language of the translation and to
> insert into/ update the appropriate column (addition of new language will
> require adding a new column and may also require changes to existing
logic).[vbcol=seagreen]
> I will still have to check the language of the search string to implement
> the FT search on the correct column (this may pose a problem if the search
> string is user input and I don't know what language the user is inputing).
> Thanks,
> Royston
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:OENJA9XKFHA.3336@.TK2MSFTNGP10.phx.gbl...
be[vbcol=seagreen]
> you
> each
> SQL
before[vbcol=seagreen]
(or[vbcol=seagreen]
> still
and[vbcol=seagreen]
language.[vbcol=seagreen]
string[vbcol=seagreen]
> in
> contains
languages[vbcol=seagreen]
properly[vbcol=seagreen]
> (for
> simplified
> the
> tables.
> searching
in[vbcol=seagreen]
of[vbcol=seagreen]
that[vbcol=seagreen]
> table
implemented)
>
|||Hi Xin Chen,
No, I am not doing the FT search on the view. My solution is to utilize
horizontal partitioning to split my translation table into multiple child
tables each storing only a particular language, and then assign a different
word breaker to each of the child table according to the language it is
storing. I am using a view to union all the child tables so that I can
access all the child tables thru the view as though it is a single table
(that is, I can insert, update, delete via the view instead of referencing
the child tables). Note that I implemented a check on the primary key column
of each of the child table, this will allow the view to know where to
insert/ update or delete the row referenced in the view. I have tried this
and it seems to work.
However, to implement FT search, I created a temporary table and for each
child table I will do a FT search with the search string and append the
result to the temporay table (like the example below). The problem I
encountered is if the search string contain a particular language string
like chinese string, and the storeprocedure is doing a FT search on a child
table with german word breaker, the storeprocedure will generate an error
#7619 ('A clause of the query contained only ignored words.') and terminate
execution. There doesn't seem to be any way I can catch the exception in the
storeprocedure so as to ignore the error and continue the search on the next
child table (that is, the one with the correct word breaker). As suggested
by John, it seems the only way to resolve this is to check the language of
the search string beforehand somehow and direct the FT search to the
respective child table. This shall work, but I am hoping if anyone know how
to catch the error#7619 exception and prevent the storeprocedure from
terminating since it may be difficult to determine the language of the
search string without restricting the user input.
Simplified Sample Code Example,
CREATE TABLE #Temp
(
searchresult ntext
)
INSERT INTO #Temp (searchresult)
SELECT translation FROM translation_german WHERE CONTAINS(*,@.searchstring)
INSERT INTO #Temp (searchresult)
SELECT * FROM translation_japanese WHERE CONTAINS(*, @.searchstring)
INSERT INTO #Temp (searchresult)
SELECT * FROM translation_chinese WHERE CONTAINS(*, @.searchstring)
Regards,
Royston
"Xin Chen" <xchen@.xtremework.com> wrote in message
news:eNm4kjfKFHA.3992@.TK2MSFTNGP15.phx.gbl...
> Royston, I am not sure what you suggest would work.
> First, I don't think you can do Full text search on view, in other word,
do
> select * from View1 where contains (data,'some search string')
> Second, you would still have to choose ONE word breaker on the indexed
data
> column, so if you have Germany and Chinese in the same column, What word
> breaker can you choose?
> I am running the similar problem, in my case, I want to full text search
on[vbcol=seagreen]
> chinese and english,but struggling to find a solution for it.
> Any suggest are welcome.
> --Xin Chen
> "Royston" <royston@.earth9.com> wrote in message
> news:uspzi$cKFHA.1284@.TK2MSFTNGP14.phx.gbl...
column[vbcol=seagreen]
> to
for[vbcol=seagreen]
> a
The[vbcol=seagreen]
> (normal
addition[vbcol=seagreen]
> of
> the
> thru
> the
will[vbcol=seagreen]
> logic).
implement[vbcol=seagreen]
search[vbcol=seagreen]
inputing).[vbcol=seagreen]
> be
2000,[vbcol=seagreen]
and[vbcol=seagreen]
the[vbcol=seagreen]
> before
> (or
> and
> language.
> string
> languages
> properly
split[vbcol=seagreen]
update,[vbcol=seagreen]
> in
> of
storeprocedure[vbcol=seagreen]
> that
next
> implemented)
>

No comments:

Post a Comment