Friday, March 23, 2012

Fuzzy Lookups and Groupings Algorithm

Hello,
I'm trying to clean my data using fuzzy lookup algorithm though SSIS, but i get null values everywhere. This is what i did:

I applied the fuzzy lookup in a table (tblValues). As source table i have the tblValues, and as reference table in Fuzzy Lookup i have the tblValues as well, resulting null values in all fields/columns.

Do i have to create my own reference table? If yes, how do i do that and what values will i have in this table?I didn't understand how the reference table must be in order the algorithm to work. Any suggestions?

Thank you in advance!

I'm not sure I understand what your objective is. If you are trying to remove duplicates in tblValues, you're better off using a fuzzy grouping task and not a fuzzy lookup task. See the following article for how to use both if you haven't read it yet.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/FzDTSSQL05.asp

|||

Thanks for the article i have already read it and i used it as reference to my project. My objective is to fill the null values with the most possible value.

I'm not sure if fuzzy does that, but in that articles says that "Fuzzy Lookup matches input records that are "dirty" (because of misspellings, truncations, missing or inserted tokens, null fields, unexpected abbreviations, and other irregularities) with clean records in a reference table. ", but i didn't understand how the reference table must be.

No comments:

Post a Comment