Wednesday, March 21, 2012

Fuzzy Lookup componant eating all available virtual memory

Hey,

I have a large set of data that I need to match against another large set of data. The reference table has 9.8mill rows and my input has 14.6mill rows. I started with a new project. I added my connection, then a task to clear the result table, then my data flow, then my OLE source, then my Fuzzy Lookup task, then my SQL Server Destination. I set the connection of my OLE source and set the query to pull the data. Then I set the connection of my Fuzzy Lookup task, set the reference table and told it to create a new index (the problem also occurs if I use a generated index) and then set up the matching criteria. Then I set the connection and destination for the SQL Server Destination.

After setting all this up, I hit Run. The thing ran great until ~ 800k rows and then it failed. I ran it several times and it always failed right around 800k with a message saying there was not enough space and then an error with buffers being passed to the Fuzzy Lookup component. I opened Task Manager and watched the resources as it ran and was amazed at what I saw. The Fuzzy Lookup component eats up every bit of Virtual Memory available and when it can't take any more, it errors out. I tried setting the Max Memory setting on the component and it seems to have no effect. I also played with the buffer settings on the data flow task to no avail. I even went as far as to put an identity on my input table and create a function that outputs selects that use a between on the identity to break the data into 600k chunks. I set up a ForEach component and DTS variables, but the Fuzzy Lookup component does not free the VM after the iteration of the ForEach component!

I ended up running each chunk of 600k one at a time. I have to automate this for the future, so I need a solution. Does anyone have an idea for me?

The errors I get are:

[Fuzzy Lookup 1 [3067]] Warning: Not enough storage is available to complete this operation.

[DTS.Pipeline] Warning: A call to the ProcessInput method for input 3068 on component "Fuzzy Lookup 1" (3067) unexpectedly kept a reference to the buffer it was passed. The refcount on that buffer was 2 before the call, and 1 after the call returned.

[DTS.Pipeline] Error: The ProcessInput method on component "Fuzzy Lookup 1" (3067) failed with error code 0x8007000E. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.

[DTS.Pipeline] Error: Thread "WorkThread0" has exited with error code 0x8007000E.

[OLE DB Source [1923]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.

[DTS.Pipeline] Error: The PrimeOutput method on component "OLE DB Source" (1923) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.

[DTS.Pipeline] Error: Thread "SourceThread0" has exited with error code 0xC0047038.

[DTS.Pipeline] Error: Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.

[DTS.Pipeline] Error: Thread "WorkThread1" has exited with error code 0xC0047039.

|||

Pls check to see whether http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=191199&SiteID=1 helps.

Thanks

Wenyang

|||

I found that thread after posting and tried the solutions there, but they had no effect on the package. It still almost linearly eats up all virtual memory. By the time it is at 800k records, it has eaten up 40gb of virtual memory.

The server is:

Dual AMD 64bit Processors

2TB storage with 650GB free

8GB ram

Windows Server 2003 64bit

SQL Server 2005 64bit

Of note also is that I have tried running the package from the command line with identical results. Here is what I used:

dtexec /f package1.dtsx /ref n

|||I had the VM size increased to 200gb and I can do 3m sets now. It still isn't to where I need to to be for automation, but it will let me get the job done for now. If anyone has some ideas on what might be wrong, please please let me know!!|||

Hey James,

Did you have any luck with this? I'm in exactly the same situation you are in regarding fuzzy lookup taking all memory. If you or anyone else figured out a work around it would be great to hear.

Thanks,

SL

|||There's been fixes for memory leak issues with the fuzzy components in the past. Could you check KB 912423: "FIX: Memory leaks occur when you use Fuzzy Lookup and Fuzzy Grouping to transform a SQL Server 2005 Integration Services package"?

No comments:

Post a Comment