February 5th, 2009

Data Mining: An Overview

What is data mining?

"The mining data involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining is more than collecting and data management, but also includes analysis and prediction. "

"Data mining can be performed on the data represented in quantitative terms, text, multimedia or shapes. Data mining applications can use a variety of parameters to examine the data. They include the association (Patterns where one event is connected to another event, such as purchasing a pen and purchasing paper), sequence or path analysis (the patterns in one event leads to another event, such as the birth of a child and purchasing diapers), classification (identification of new models such as the similarities between the purchases of duct tape and plastic sheeting purchases), clustering (finding and visually documenting groups of previously unknown facts, such as geographic location and brand preferences), and forecasting (discovering patterns that one can make reasonable predictions regarding future activities, such as the prediction that people who join an athletic club may take exercise classes). "

Reflecting this conceptualization of data mining, some observers consider data mining to be just one step in a process largest known as knowledge discovery in databases (KDD). Further steps in the KDD process, in progressive order include data cleansing, data integration, data selection, data transformation, pattern assessment, and knowledge presentation.

A series of advances in technology and business processes have contributed to a growing interest in data mining, both in the public sector private. Some of these changes include the growth of computer networks, which can be used to connect databases, development of techniques for improvement related search, such as neural networks and advanced algorithms, the spread of the client / server computing, allowing users to access centralized data resources from the desktop, and a greater ability to combine data from different sources into a single source search.

Data mining has become increasingly common in both public and private sectors. Organizations use data mining as a tool to examine the customer information, reduce fraud and waste, and assist in medical research. However, the proliferation of data mining has raised some issues of implementation and monitoring as well. These include concerns about the quality of the data analyzed, the interoperability of databases and software between agencies, and potential infringements on privacy.

Limitations of Data Mining

"While the data mining products can be very powerful tools, they are not alone applications. To be successful, requires skilled mining technical specialists analytical and can structure the analysis and interpretation of the output is created. Consequently, the limitations of data mining are primarily data or related personnel, rather than "related to technology.

"Although data mining can help reveal patterns and relationships, do not tell the user the value or importance of these patterns. Such determinations must be made by the user. Similarly, the validity of discovered patterns depends on how compared with the "real world" circumstances. For example, to assess the validity of a data mining application designed to identify possible suspects of terrorism in a large number of people, the user can test the model using data that includes information about known terrorists. However, while reaffirming possibly a certain profile, does not necessarily mean that the application to identify a suspect whose behavior significantly deviates from the original model. "

"Another limitation of data mining is that while it can identify connections between behaviors and / or variables, not necessarily identified a causal relationship. For example, an application that can identify a pattern of behavior, such as the propensity to buy airline tickets shortly before departure is scheduled for departure is related to characteristics such as income, education level and Internet use. However, this does not necessarily indicate that the ticket purchasing behavior is caused by one or more of these variables. In fact, the behavior of individual might be affected by any additional variable (s), such as occupation (The need to travel the short term), family status (a sick relative needing care), or a hobby (taking advantage of last minute discounts to visit new destinations).

Data Mining Applications

"Data mining is used for a variety of purposes, both in terms public and private. Industries such as banking, insurance, medicine, retail and frequently used data mining to reduce costs, enhance research, and increase sales. For example, insurance and banking using data mining applications to detect fraud and assist in the evaluation of risks (for example, the score credit.) From customer data collected over several years, companies can develop models that predict whether a customer is a good credit risk, or if a accident claim may be fraudulent and should be investigated more closely. The medical community sometimes uses data mining to help predict the efficacy of a procedure or medicine. Pharmaceutical companies use data mining of chemical compounds and genetic material to help guide research on new treatments for diseases. Retailers can use information collected through affinity programs (eg, cards buyers' club, frequent flyer points, contests) to assess the effectiveness of product selection and placement decisions, coupon offers, and what products are often purchased together. Companies such as telephone service providers and music clubs can use data mining to create a churn analysis "," to assess which customers are likely to remain as subscribers and which are likely to switch to a competitor. "

"In the public sector, data mining applications initially were used as a means to detect fraud and waste, but also grown to be used for purposes such as measuring and improving program performance. It has been reported that data mining has helped the federal government to recover million in fraudulent Medicare payments. The Justice Department has been able to use data mining to assess crime patterns and adjust allocations resources accordingly. Equally, the Department of Veterans Affairs has used data mining to help predict demographic changes in the electoral district that is for one to evaluate its budget needs. Another example is the Federal Aviation Administration, which uses data mining to review the data crash recognize common defects and recommend precautionary measures. "

Recently, data mining has been increasingly cited as an important tool for homeland security efforts. Some observers suggest that data mining should be used as a means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, as through travel and immigration records. Two initiatives that have attracted considerable attention include the now-discontinued Terrorism Information Awareness (TIA) conducted by the Defense Advanced Research Projects Agency (DARPA) and the now-canceled Computer-Assisted Passenger Prescreening System II (CAPPS II) that was developed by the Administration Transportation Security Administration (TSA). CAPPS II is being replaced by a new program called Secure Flight.

About the Author

Vineet Pandit
M.Tech (Software Systems)


The Organized Family Historian: How to File, Manage, and Protect Your Genealogical Research and Heirlooms (National Genealogical Society Guides)


The Organized Family Historian: How to File, Manage, and Protect Your Genealogical Research and Heirlooms (National Genealogical Society Guides)


$19.99


It can take hours to research family history and it is easy to become inundated with stuff – paper records, recordings, photographs, notes, artifacts, and more information than one would imagine could ever exist. The usefulness of the collection is in the organization – using computers, archival boxes, files, and forms to help you put your hands on what you need when you need it. Also included, in…

Family Tree


Family Tree


$68.51


High Quality Content by WIKIPEDIA articles High Quality Content by WIKIPEDIA articles A family tree is a chart representing family relationships in a conventional tree structure. The more detailed family trees used in medicine, genealogy, and social work are known as genograms.Genealogical data can be represented in several formats, for example as a pedigree or ancestor chart. Family trees are often presented with the oldest generations at the top and the newer generations at the bottom. An ancestry chart, which is a tree showing the ancestors of an individual, will more closely resemble a tree in shape, being wider at the top than the bottom. In some ancestry charts, an individual appears on the left and his or her ancestors appear to the right. A descendancy chart, which depicts all the descendants of an individual will be narrowest at the top. Author: Surhone, Lambert M./ Timpledon, Miriam T./ Marseken, Susan F. Binding Type: Paperback Number of Pages: 88 Publication Date: 2010/07/09 Language: English Dimensions: 6.00 x 9.02 x 0.21 inches

Family Tree Detective (Paperback)


Family Tree Detective (Paperback)


$23.04


Family history is a topic that fascinates anyone, but the research and academics behind tracing a family tree can make genealogy seem like an intimidating hobby. Family Tree Detective will remove the intimidationfactor by breaking down the research methods into basic steps readers can follow to fi nd immediate success. Readers will learn how to start their search with sources in their own homes?journals, scrapbooks and other memorabilia. They?ll also learn how to locateand effectively use census, courthouse and church records and keep their information organized using pedigree charts, group sheets and effective organizing systems.

The Family Tree Sourcebook By Family Tree Magazine


The Family Tree Sourcebook By Family Tree Magazine


$37.23


Provides genealogists with research summaries, maps, and timelines for every U.S. state; countylevel data that can be utilized to acquire most genealogical records; and listings of contact information, Web sites, libraries, and genealogical and historical societies. Author: Family Tree Magazine Subtitle: The Essential Guide to American County and Town Sources Publication Date: 2010/10/13 Number of Pages: 746 Binding Type: Paperback Language: English Depth: 1.50 Width: 8.50 Height: 11.00

Nova Family Tree Maker


Nova Family Tree Maker


$91.99


41012 The #1-selling family history software Easily research your roots and document your family story Exclusive! Access more than 4 billion historical records on Ancestry.com Create family books, charts and reports and slideshows to share with others Add photos directly from iPhoto Includes the Family History Toolkit DVD Processor: Intel-based Operating System: Mac OS X 10.5 or later Hard Disk: 450 MB Space Memory: 1 GB RAM Others: Internet connection required for Internet features Family Tree Maker for Mac gives you more exciting ways to discover your history. Rich storytelling tools let you build your family tree and create charts & reports, plus incorporate photos and video. Only Family Tree Maker for Mac includes a free 6-month Ancestry.com membership with access to 3 billion U.S. historical records, including immigration, marriage and census data. The end result is a family history you’ll treasure for years to come. Complete Product Family Tree Maker Graphics/Designing Graphics/Multimedia Intel-based Mac Mac OS Not Applicable Nova Nova Development Corp Software Standard www.novadevelopment.com

Family Tree Scrapbook Page Kit


Family Tree Scrapbook Page Kit


$16.09


Scrapbooking has never been easier with this 12-inch scrapbook page kit. Featuring a Family Tree theme, this scrapbook page kit contains papers and stickers.Contains eight (8) 12-inch square papersFour designs of papersTwo (2) cardstock sticker sheetsFamily Tree theme

The Family Tree (DVD)


The Family Tree (DVD)


$30.85


FAMILY TREE, THE (WS)


Tags: , , , , ,


Spam Protection by WP-SpamFree


The owner of this website Susie Mills is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking WT-Paz Ancestry to Amazon Properties including, but not limited to amazon.com, endless.com, smallparts.com, myhabit.com or amazonwireless.com