Showing posts with label group. Show all posts
Showing posts with label group. Show all posts

Monday, March 26, 2012

Gather Format and Store - Right or Wrong

The IT group that I work with has the habit of gathering data,
formatting (i.e. in reports) and then storing the same formated data in
the same database.
I think the practice is wrong. I think the activity is fundamentally
wrong because we are storing the exact same data in a database in two
different locations. Somehow I have the impression that database design
is about "oneness".
I believe that collecting the data and then storing summerized data for
reporting into a data warehouse would be the right solution.
I am getting flack for my viewpoint.
Am I all washed up?That sounds weird. IF the formatted data is stored for performance reasons,
I'd at least have it in
another database. But I prefer to do the report off of the production databa
se (if low activity and
doesn't have perf impact), or have a different database better suited for re
porting off of.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"rlm" <groups@.rlmoore.net> wrote in message
news:1146747407.524001.97330@.j33g2000cwa.googlegroups.com...
> The IT group that I work with has the habit of gathering data,
> formatting (i.e. in reports) and then storing the same formated data in
> the same database.
> I think the practice is wrong. I think the activity is fundamentally
> wrong because we are storing the exact same data in a database in two
> different locations. Somehow I have the impression that database design
> is about "oneness".
> I believe that collecting the data and then storing summerized data for
> reporting into a data warehouse would be the right solution.
> I am getting flack for my viewpoint.
> Am I all washed up?
>

Gather Format and Store - Right or Wrong

The IT group that I work with has the habit of gathering data,
formatting (i.e. in reports) and then storing the same formated data in
the same database.

I think the practice is wrong. I think the activity is fundamentally
wrong because we are storing the exact same data in a database in two
different locations. Somehow I have the impression that database design
is about "oneness".

I believe that collecting the data and then storing summerized data for
reporting into a data warehouse would be the right solution.

I am getting flack for my viewpoint.

Am I all washed up?That sounds weird. IF the formatted data is stored for performance reasons, I'd at least have it in
another database. But I prefer to do the report off of the production database (if low activity and
doesn't have perf impact), or have a different database better suited for reporting off of.

--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/

"rlm" <groups@.rlmoore.net> wrote in message
news:1146747407.524001.97330@.j33g2000cwa.googlegro ups.com...
> The IT group that I work with has the habit of gathering data,
> formatting (i.e. in reports) and then storing the same formated data in
> the same database.
> I think the practice is wrong. I think the activity is fundamentally
> wrong because we are storing the exact same data in a database in two
> different locations. Somehow I have the impression that database design
> is about "oneness".
> I believe that collecting the data and then storing summerized data for
> reporting into a data warehouse would be the right solution.
> I am getting flack for my viewpoint.
> Am I all washed up?

Gather Format and Store - Right or Wrong

The IT group that I work with has the habit of gathering data,
formatting (i.e. in reports) and then storing the same formated data in
the same database.
I think the practice is wrong. I think the activity is fundamentally
wrong because we are storing the exact same data in a database in two
different locations. Somehow I have the impression that database design
is about "oneness".
I believe that collecting the data and then storing summerized data for
reporting into a data warehouse would be the right solution.
I am getting flack for my viewpoint.
Am I all washed up?That sounds weird. IF the formatted data is stored for performance reasons, I'd at least have it in
another database. But I prefer to do the report off of the production database (if low activity and
doesn't have perf impact), or have a different database better suited for reporting off of.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"rlm" <groups@.rlmoore.net> wrote in message
news:1146747407.524001.97330@.j33g2000cwa.googlegroups.com...
> The IT group that I work with has the habit of gathering data,
> formatting (i.e. in reports) and then storing the same formated data in
> the same database.
> I think the practice is wrong. I think the activity is fundamentally
> wrong because we are storing the exact same data in a database in two
> different locations. Somehow I have the impression that database design
> is about "oneness".
> I believe that collecting the data and then storing summerized data for
> reporting into a data warehouse would be the right solution.
> I am getting flack for my viewpoint.
> Am I all washed up?
>

Friday, March 23, 2012

fuzzyness...

Greetings
I'm working on a large and growing database of security alarms. I'm looking at ways to mine the data that allow me to group sets of records with similar values. For example
"device-xyz", "high cpu load from device xyz 95% > threshold of 90%"
"device-xyz", "high cpu load from device xyz 96% > threshold of 90%"
if I do a plain sql search
SELECT device-name, Count(device-name) AS total1, Summary
FROM dataset
GROUP BY devicename, Summary;
I end up with two records since summary is differentt. The desire is to have a single record that combines the two using some form of fuzzyness.
I'm not able to use wildcards to solve this problem because I have a large volume of data all with a few values that could be different. I'm really looking I guess at some form of percentage of variation on the description.
I have started to look at a few tools such as dataflux that do look promising.
Has anyone addressed this problem before and can recommend an approach?
Many thanks in advance
Jdjg,
You may want to check out the sql function DIFFERENCE
and/or PATINDEX.
An approach I would recommed is to cut up each of the strings to words.
Then look for the best DIFFERENCE score of each word in the first string with the
words in the 2nd. To aggregate for the whole string use something like standard deviation.
You could encapsulate this in a user defined funtion - it wouldn't be performant though.
Regards
AJ
"djg" <anonymous@.discussions.microsoft.com> wrote in message
news:C618A23A-E59A-4DAC-91C8-883A6C4E3BE1@.microsoft.com...
> Greetings
> I'm working on a large and growing database of security alarms. I'm looking at ways to mine the data that
allow me to group sets of records with similar values. For example
> "device-xyz", "high cpu load from device xyz 95% > threshold of 90%"
> "device-xyz", "high cpu load from device xyz 96% > threshold of 90%"
> if I do a plain sql search
> SELECT device-name, Count(device-name) AS total1, Summary
> FROM dataset
> GROUP BY devicename, Summary;
> I end up with two records since summary is differentt. The desire is to have a single record that combines
the two using some form of fuzzyness.
> I'm not able to use wildcards to solve this problem because I have a large volume of data all with a few
values that could be different. I'm really looking I guess at some form of percentage of variation on the
description.
> I have started to look at a few tools such as dataflux that do look promising.
> Has anyone addressed this problem before and can recommend an approach?
> Many thanks in advance
> J|||On Sun, 26 Oct 2003 11:21:05 -0800, djg
<anonymous@.discussions.microsoft.com> wrote:
>I'm working on a large and growing database of security alarms.
>I'm looking at ways to mine the data that allow me to group
>sets of records with similar values. For example
>"device-xyz", "high cpu load from device xyz 95% > threshold of 90%"
>"device-xyz", "high cpu load from device xyz 96% > threshold of 90%"
>if I do a plain sql search
>SELECT device-name, Count(device-name) AS total1, Summary
>FROM dataset
>GROUP BY devicename, Summary;
>I end up with two records since summary is differentt.
>The desire is to have a single record that combines the two
>using some form of fuzzyness.
How about exactness -- try "with rollup"
Put the decision rules in the front end. :)
J.

Wednesday, March 21, 2012

Fuzzy Grouping Transform Corrupts Pass-through Data

We are working with a client and are using Fuzzy Group transform for de-duping, and hierarchy creation for a national account list.

I've found that if a large number of pass through columns are sent to the Fuzzy Grouping transforms it randomly corrupts the char columns.

Our work around was to only pass through ID columns and then build out the attributes needed from views against the Fuzzy group output however product team should take a look at this.
By corruption I mean random characters from other records would show up in character columns (we had address and name corruption in about 10% of a 1.5 million record dataset).

Thanks.

Michael Slater
Software Architects

Michael,

Thanks for your post. We have been unable to reproduce the problem you have reported in the new test cases that we have created for this issue. We would very much like to get to the bottom of what you are seeing. Can you please contact me directly so that we might work with you to find a better repro case that can be used to further investigate and fix this problem?

Please send an email to KrisGan@.microsoft.com

Thanks,
Kris Ganjamsql

Fuzzy Group Updates?

Hi there,

Quick Background: I have an SSIS package that reads data from a flat file then runs it through a Fuzzy Grouping component. The result of this Fuzzy Group is put into a SQL server 2005 table.

Question: Over time, the flat file will be adding new records (some that should be added to existing groups) and so I'll need to update my Fuzzy Group table to include these new records. Is there anyway to simply add these new records to the existing Fuzzy Group without changing all of the _key_out values? If I completely regenerate the Fuzzy Group table that will potentially give me different _key_out values correct?

Does this make sense?

Any help would be greatly appreciated!

>>" If I completely regenerate the Fuzzy Group table that will potentially give me different _key_out values correct?"

Correct.

wenyang

|||

Thanks for the reply!

Anyway to preserve the _key_out while still adding records to the groups? Sounds like a complete rebuild of the Fuzzy group is out of the question. Anyway to do this incrementally?

|||

Hi,

Yes, each time you run Fuzzy Grouping with a different set of input rows (or with a different threshold), it is possible that different groupings will result.

If you have run FG once and would like to keep the existing groups, one alternative would be to use Fuzzy Lookup for the incremental input rows. You would basically perform a fuzzy lookup against the output of FG and return the _key_out of the best matching row. You have thus effectively found a group for the new input row. If no match is found above the FL match threshold, then just assign a new unique _key_out to the input row to create a new group.

A slight problem with this approach is that over time all the incremental rows may not be grouped as well as they could be, as the clustering algorithm that Fuzzy Grouping uses to globally pick groupings is not being employed. At that point you may want to just rerun FG and switch to the new groupings.

We are considering adding a feature in the next version that will allow you keep all the old groupings intact.

Let us know if you have any more questions.

Regards,

-Kris

|||Thanks Kris, that will probably suffice for now. Yeah put that in the next version, the FG component is great but it doesn't have much use after the initial run because of this limitation.

Monday, March 12, 2012

Function update

A gentleman from here helped me to write a function to convert oracle
date to a MS Sql date.
Here is the link
http://groups.google.com/group/micr...r />
8Jm9ADFUU
Yr6IuaVp8r6Fxhin3IJmS3764Q
The previous function you wrote
CREATE FUNCTION dbo.to_date
(@.dt VARCHAR(50), @.dt_format VARCHAR(50))
RETURNS DATETIME
AS
BEGIN
RETURN
CONVERT(DATETIME,
CONVERT(CHAR(12),
CAST(
SUBSTRING(@.dt,PATINDEX('%YYYY%',@.dt_form
at),4)+
SUBSTRING(@.dt,PATINDEX('%MM%',@.dt_format
),2)+
SUBSTRING(@.dt,PATINDEX('%DD%',@.dt_format
),2) AS DATETIME)
,0)+
SUBSTRING(@.dt,PATINDEX('%HH%',@.dt_format
),2)+':'+
SUBSTRING(@.dt,PATINDEX('%MI%',@.dt_format
),2)+':'+
SUBSTRING(@.dt,PATINDEX('%SS%',@.dt_format
),2)+' '+
SUBSTRING(@.dt,PATINDEX('%AM%',@.dt_format
),2),9)
END
works for TO_Date( '12/27/1996 03:48:26 PM', 'MM/DD/YYYY HH:MI:SS AM')
I would like to modify your function for the data below
to_date('05/31/2000 10:17:22', 'MM/DD/YYYY HH24:MI:SS')
Can you help me to modify the function to work with the new format.
Thanks.Thanks. I figured it out. simple solution to just Return
convert(datetime,@.dt) without all the complexity.

Wednesday, March 7, 2012

Function - List Windows Group Members

Hello!
I thinking of creating a SQL Server clr function that works like this.
Input paremeter: A Windows group, e.g. Domain1\Group1
Resultset: A list of all accounts that directely or indirectely belongs to
that Windows group.
e.g.
Domain1\Account1
Domain1\Account2
Domain1\Account3
The idea is to use this together with the security catalog views to get a
list of accounts that is for example sysadmins.
Could it be that a function like this exist in SQL Server?
I also thought about what AD permisssions that would be needed?
Best regards
Ola Hallengren
Hi
You can create an ADSI linked server and query the active directory, but
recursing through the groups not always that easy. There is a script to list
all group members on http://www.rlmueller.net/freecode3.htm
John
"Ola Hallengren" wrote:

> Hello!
> I thinking of creating a SQL Server clr function that works like this.
> Input paremeter: A Windows group, e.g. Domain1\Group1
> Resultset: A list of all accounts that directely or indirectely belongs to
> that Windows group.
> e.g.
> Domain1\Account1
> Domain1\Account2
> Domain1\Account3
> The idea is to use this together with the security catalog views to get a
> list of accounts that is for example sysadmins.
> Could it be that a function like this exist in SQL Server?
> I also thought about what AD permisssions that would be needed?
> Best regards
> Ola Hallengren
|||Thanks, John. As I understand it I have about these options.
1. Create an ADSI linked server and query against that with TSQL. This can
be packaged into a SQL Server function.
2. Use the VBScript at http://www.rlmueller.net/freecode3.htm from
xp_cmdshell. This can be packaged into a stored procedure.
3. Create some CLR code with the same logic as in the VBScript and use that
assambly in a SQL Server function.
Is this right?
/Ola
"John Bell" wrote:
[vbcol=seagreen]
> Hi
> You can create an ADSI linked server and query the active directory, but
> recursing through the groups not always that easy. There is a script to list
> all group members on http://www.rlmueller.net/freecode3.htm
> John
> "Ola Hallengren" wrote:
|||I think so, I would opt for either 2 or 3 but have never tried 3!.
John
"Ola Hallengren" <OlaHallengren@.discussions.microsoft.com> wrote in message
news:05FC89C1-3816-41B9-9123-5D9E4FFBEFC0@.microsoft.com...[vbcol=seagreen]
> Thanks, John. As I understand it I have about these options.
> 1. Create an ADSI linked server and query against that with TSQL. This can
> be packaged into a SQL Server function.
> 2. Use the VBScript at http://www.rlmueller.net/freecode3.htm from
> xp_cmdshell. This can be packaged into a stored procedure.
> 3. Create some CLR code with the same logic as in the VBScript and use
> that
> assambly in a SQL Server function.
> Is this right?
> /Ola
>
> "John Bell" wrote:

Function - List Windows Group Members

Hello!
I thinking of creating a SQL Server clr function that works like this.
Input paremeter: A Windows group, e.g. Domain1\Group1
Resultset: A list of all accounts that directely or indirectely belongs to
that Windows group.
e.g.
Domain1\Account1
Domain1\Account2
Domain1\Account3
The idea is to use this together with the security catalog views to get a
list of accounts that is for example sysadmins.
Could it be that a function like this exist in SQL Server?
I also thought about what AD permisssions that would be needed?
Best regards
Ola HallengrenHi
You can create an ADSI linked server and query the active directory, but
recursing through the groups not always that easy. There is a script to list
all group members on http://www.rlmueller.net/freecode3.htm
John
"Ola Hallengren" wrote:
> Hello!
> I thinking of creating a SQL Server clr function that works like this.
> Input paremeter: A Windows group, e.g. Domain1\Group1
> Resultset: A list of all accounts that directely or indirectely belongs to
> that Windows group.
> e.g.
> Domain1\Account1
> Domain1\Account2
> Domain1\Account3
> The idea is to use this together with the security catalog views to get a
> list of accounts that is for example sysadmins.
> Could it be that a function like this exist in SQL Server?
> I also thought about what AD permisssions that would be needed?
> Best regards
> Ola Hallengren|||Thanks, John. As I understand it I have about these options.
1. Create an ADSI linked server and query against that with TSQL. This can
be packaged into a SQL Server function.
2. Use the VBScript at http://www.rlmueller.net/freecode3.htm from
xp_cmdshell. This can be packaged into a stored procedure.
3. Create some CLR code with the same logic as in the VBScript and use that
assambly in a SQL Server function.
Is this right?
/Ola
"John Bell" wrote:
> Hi
> You can create an ADSI linked server and query the active directory, but
> recursing through the groups not always that easy. There is a script to list
> all group members on http://www.rlmueller.net/freecode3.htm
> John
> "Ola Hallengren" wrote:
> > Hello!
> >
> > I thinking of creating a SQL Server clr function that works like this.
> >
> > Input paremeter: A Windows group, e.g. Domain1\Group1
> >
> > Resultset: A list of all accounts that directely or indirectely belongs to
> > that Windows group.
> >
> > e.g.
> > Domain1\Account1
> > Domain1\Account2
> > Domain1\Account3
> >
> > The idea is to use this together with the security catalog views to get a
> > list of accounts that is for example sysadmins.
> >
> > Could it be that a function like this exist in SQL Server?
> >
> > I also thought about what AD permisssions that would be needed?
> >
> > Best regards
> >
> > Ola Hallengren|||I think so, I would opt for either 2 or 3 but have never tried 3!.
John
"Ola Hallengren" <OlaHallengren@.discussions.microsoft.com> wrote in message
news:05FC89C1-3816-41B9-9123-5D9E4FFBEFC0@.microsoft.com...
> Thanks, John. As I understand it I have about these options.
> 1. Create an ADSI linked server and query against that with TSQL. This can
> be packaged into a SQL Server function.
> 2. Use the VBScript at http://www.rlmueller.net/freecode3.htm from
> xp_cmdshell. This can be packaged into a stored procedure.
> 3. Create some CLR code with the same logic as in the VBScript and use
> that
> assambly in a SQL Server function.
> Is this right?
> /Ola
>
> "John Bell" wrote:
>> Hi
>> You can create an ADSI linked server and query the active directory, but
>> recursing through the groups not always that easy. There is a script to
>> list
>> all group members on http://www.rlmueller.net/freecode3.htm
>> John
>> "Ola Hallengren" wrote:
>> > Hello!
>> >
>> > I thinking of creating a SQL Server clr function that works like this.
>> >
>> > Input paremeter: A Windows group, e.g. Domain1\Group1
>> >
>> > Resultset: A list of all accounts that directely or indirectely belongs
>> > to
>> > that Windows group.
>> >
>> > e.g.
>> > Domain1\Account1
>> > Domain1\Account2
>> > Domain1\Account3
>> >
>> > The idea is to use this together with the security catalog views to get
>> > a
>> > list of accounts that is for example sysadmins.
>> >
>> > Could it be that a function like this exist in SQL Server?
>> >
>> > I also thought about what AD permisssions that would be needed?
>> >
>> > Best regards
>> >
>> > Ola Hallengren

Friday, February 24, 2012

Full-Text search row limit

Hi,
This question is for all group, but particulary to Hillary Cotter, in
relation to one article published in msdn
(http://msdn.microsoft.com/library/de...tml/sp04f9.asp).
Hillary, in this article, you explain that the new Full-text search engine,
in SQL Server 2005, can index up to 2 billion rows and the limit on SQL 2000
was between three and 30 million rows.
I think that I don't understand you. Is that the limit in SQL Server 2000?
30 millions rows?
Thanks in advanced to Hillary and all newsgroup.
qwalgrande
AFAIK there is no hard limit for SQL 2000, rather performance declines when
you start to get that high. I got the 30,000,000 statistic from MS, let me
try to dig it up and provide you with the link.
Hilary
"qwalgrande" <qwalgrande*nospam*@.yahoo.es> wrote in message
news:F424BC6C-CC52-4582-8D01-80C0908F1F6B@.microsoft.com...
> Hi,
> This question is for all group, but particulary to Hillary Cotter, in
> relation to one article published in msdn
> (http://msdn.microsoft.com/library/de...tml/sp04f9.asp).
> Hillary, in this article, you explain that the new Full-text search
> engine,
> in SQL Server 2005, can index up to 2 billion rows and the limit on SQL
> 2000
> was between three and 30 million rows.
> I think that I don't understand you. Is that the limit in SQL Server 2000?
> 30 millions rows?
> Thanks in advanced to Hillary and all newsgroup.
> --
> qwalgrande
|||qwalgrande,
Actually, no upper limit has ever been made public by Microsoft for SQL
Server 7.0 or SQL Server 2000 Full-text Indexing functionality. While at
Microsoft, I worked directly with a SQL Server 7.0 Full-text Indexing
customer who with my help, successfully FT Indexed a table with 64 million
rows. Although, SQL Server 2000 was tested to 20 million rows, I have worked
regularly with customers and clients who have successfully FT Indexed more
than 30 million rows table.
Regards,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"qwalgrande" <qwalgrande*nospam*@.yahoo.es> wrote in message
news:F424BC6C-CC52-4582-8D01-80C0908F1F6B@.microsoft.com...
> Hi,
> This question is for all group, but particulary to Hillary Cotter, in
> relation to one article published in msdn
>
(http://msdn.microsoft.com/library/de...-us/dnsqlpro04
/html/sp04f9.asp).
> Hillary, in this article, you explain that the new Full-text search
engine,
> in SQL Server 2005, can index up to 2 billion rows and the limit on SQL
2000
> was between three and 30 million rows.
> I think that I don't understand you. Is that the limit in SQL Server 2000?
> 30 millions rows?
> Thanks in advanced to Hillary and all newsgroup.
> --
> qwalgrande
|||Thanks to both.
qwalgrande
"John Kane" wrote:

> qwalgrande,
> Actually, no upper limit has ever been made public by Microsoft for SQL
> Server 7.0 or SQL Server 2000 Full-text Indexing functionality. While at
> Microsoft, I worked directly with a SQL Server 7.0 Full-text Indexing
> customer who with my help, successfully FT Indexed a table with 64 million
> rows. Although, SQL Server 2000 was tested to 20 million rows, I have worked
> regularly with customers and clients who have successfully FT Indexed more
> than 30 million rows table.
> Regards,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "qwalgrande" <qwalgrande*nospam*@.yahoo.es> wrote in message
> news:F424BC6C-CC52-4582-8D01-80C0908F1F6B@.microsoft.com...
> (http://msdn.microsoft.com/library/de...-us/dnsqlpro04
> /html/sp04f9.asp).
> engine,
> 2000
>
>