Friday, February 24, 2012

full-text search failed with one million of key words (Sqlserver20

Hello,
I have to make big tests with sql server 2005 express (i'm waiting credits
to buy a full license).
I detected a big problem with my full-text search, and i don't know if it's
a bad use of Sql Server or if it's impossible to do it!
Here's my use case :
a table like this :
primary-key | data
1 | toto value1
2 | toto value2
3 | toto value3
...
1000001 | toto value1000001
So my table contains one millions of rows. Also, my index contains one
million+one key words : "toto" and all the "valueXXXX".
if i make those requests :
select count(primary-key) from table where contains(data,'"toto"') -> ok,
1000001
select count(primary-key) from table where contains(data,'"value5321"') ->
ok, 1
select count(primary-key) from table where contains(data,'"t*"') -> ok,
1000001
select count(primary-key) from table where contains(data,'"v"') -> KO, error
message "too complex request ect". I was expected 1000001.
Can someone help me ? I'm sure sql server 2005 can do it, but i'm also sure
i forgot an option...
Note : i work on a database who contains several millions rows, and
sometimes, users need to make some insert/update of several thousands of
rows. So they need to make very big requests to check the insert/update.
And i'm not english, so sorry for the mistakes ;-)
Thanks to all !
Hi Europa_Lili,
I assume the query you're submitting (the one that fails) actually looks like
select count(primary-key) from table where contains(data,'"v*"')
(i.e. with a * after v, meaning it's a prefix query)
SQL 2005 FTS will throw this error when the prefix identifies too many
distinct keywords that match it. What are the semantics of the query you're
trying to accomplish?
Another advice: if every row of your table contains "toto", then there's no
point in indexing it since querying for it provides no value. It will just
unnecessarily increase the size of your index. You may consider adding it to
the list of noise words. See SQL Books Online for information on noise word
files and how to change them (you'll need to re-index your data).
"Europa_Lili" wrote:

> Hello,
> I have to make big tests with sql server 2005 express (i'm waiting credits
> to buy a full license).
> I detected a big problem with my full-text search, and i don't know if it's
> a bad use of Sql Server or if it's impossible to do it!
> Here's my use case :
> a table like this :
> primary-key | data
> --
> 1 | toto value1
> 2 | toto value2
> 3 | toto value3
> ...
> 1000001 | toto value1000001
> So my table contains one millions of rows. Also, my index contains one
> million+one key words : "toto" and all the "valueXXXX".
> if i make those requests :
> select count(primary-key) from table where contains(data,'"toto"') -> ok,
> 1000001
> select count(primary-key) from table where contains(data,'"value5321"') ->
> ok, 1
> select count(primary-key) from table where contains(data,'"t*"') -> ok,
> 1000001
> select count(primary-key) from table where contains(data,'"v"') -> KO, error
> message "too complex request ect". I was expected 1000001.
> Can someone help me ? I'm sure sql server 2005 can do it, but i'm also sure
> i forgot an option...
> Note : i work on a database who contains several millions rows, and
> sometimes, users need to make some insert/update of several thousands of
> rows. So they need to make very big requests to check the insert/update.
> And i'm not english, so sorry for the mistakes ;-)
> Thanks to all !
>
|||Hello,
thanks a lot for your answer.
Research with "toto" was just a test. In real life, my database contains
some informations (text, colours, ect) about picture files . Pictures come
from big old books numerisation. (Sometimes, we've got 500 to 1000 pictures
for only one book) And the table i want to query looks like this :
primary-key | data-text | file name
| localization
1 | page 1 of a book toto | new_york_library_522 |
C:/newyork
2 | page 2 of a document | paris_library_81
| D:/library_paris 3 | image from a old
book | new_york_library_523 | C:/newyork
......
Here is a exemple of what i have to do :
we receive 500 000 files (all the old books of one library). All the files
have a name like this "NAME_LIBRARY_XXXXXXXXX" where X is a number.
And i need to select all these files to make some update actions in the
database.
I can't use the date.
so i need a query like "select * from mytable where
contains(file-name,'"NAME_LIBRARY_*"')".
And it don't work, because there's 500 000 results.
Another idea to help me ?
Have a good day and thanks for all
"denistc" wrote:
[vbcol=seagreen]
> Hi Europa_Lili,
> I assume the query you're submitting (the one that fails) actually looks like
> select count(primary-key) from table where contains(data,'"v*"')
> (i.e. with a * after v, meaning it's a prefix query)
> SQL 2005 FTS will throw this error when the prefix identifies too many
> distinct keywords that match it. What are the semantics of the query you're
> trying to accomplish?
> Another advice: if every row of your table contains "toto", then there's no
> point in indexing it since querying for it provides no value. It will just
> unnecessarily increase the size of your index. You may consider adding it to
> the list of noise words. See SQL Books Online for information on noise word
> files and how to change them (you'll need to re-index your data).
> "Europa_Lili" wrote:

No comments:

Post a Comment