Wednesday, March 21, 2012

Fuzzy Group Updates?

Hi there,

Quick Background: I have an SSIS package that reads data from a flat file then runs it through a Fuzzy Grouping component. The result of this Fuzzy Group is put into a SQL server 2005 table.

Question: Over time, the flat file will be adding new records (some that should be added to existing groups) and so I'll need to update my Fuzzy Group table to include these new records. Is there anyway to simply add these new records to the existing Fuzzy Group without changing all of the _key_out values? If I completely regenerate the Fuzzy Group table that will potentially give me different _key_out values correct?

Does this make sense?

Any help would be greatly appreciated!

>>" If I completely regenerate the Fuzzy Group table that will potentially give me different _key_out values correct?"

Correct.

wenyang

|||

Thanks for the reply!

Anyway to preserve the _key_out while still adding records to the groups? Sounds like a complete rebuild of the Fuzzy group is out of the question. Anyway to do this incrementally?

|||

Hi,

Yes, each time you run Fuzzy Grouping with a different set of input rows (or with a different threshold), it is possible that different groupings will result.

If you have run FG once and would like to keep the existing groups, one alternative would be to use Fuzzy Lookup for the incremental input rows. You would basically perform a fuzzy lookup against the output of FG and return the _key_out of the best matching row. You have thus effectively found a group for the new input row. If no match is found above the FL match threshold, then just assign a new unique _key_out to the input row to create a new group.

A slight problem with this approach is that over time all the incremental rows may not be grouped as well as they could be, as the clustering algorithm that Fuzzy Grouping uses to globally pick groupings is not being employed. At that point you may want to just rerun FG and switch to the new groupings.

We are considering adding a feature in the next version that will allow you keep all the old groupings intact.

Let us know if you have any more questions.

Regards,

-Kris

|||Thanks Kris, that will probably suffice for now. Yeah put that in the next version, the FG component is great but it doesn't have much use after the initial run because of this limitation.

No comments:

Post a Comment