Term Paper on "Text Mining"

Term Paper 10 pages (3299 words) Sources: 10

[EXCERPT] . . . .

Mining

The concept of text mining comes from the idea that there is a relationship between the terms used in a text message or file if that file is unstructured. The relation may extend to other similar files and the relation once established can provide information to business and researchers on many areas that would change the way they do business or enhance knowledge.

The definition of text mining is very broad. In simple terms it can be said that the term 'text mining' refers to the process whereby information is retrieved in text form. But it can also be deeper wherein a pattern is to be established in the textual data where there is a need not only to find the proper text but also a theory of making the information useful. Many definitions of text mining basically assumes the single need to extract high end information from a data base or other unstructured text field and make use of the text in arriving at some conclusion. There is also a connection between the analysis of text and the concept of data mining. This is slightly different in the sense that data mining relates to a structured data base from which the information is sought. The principal challenge in text mining is that the method is used to achieve the same result with unstructured data. (Trujillo, 2010)

Text is the most used media and data type. It is the means of exchange of information of all people and the medium used are e-mail, chat, and digital libraries and reports and other books that are available on the internet and other communication medium. There are not only data generated by the users of the net but also a lot of journals, research material and other valuable reports
Continue scrolling to

download full paper
like statistical reports and government related work. These data bases grow at astronomical proportions and are distributed on a global scale. (Mitra; Acharya, 2003) Thus text mining has become an important tool set in many operations on the information spectrum.

Technical Details:

The method of text mining is complex and has a lot of steps and determinants for it to be totally successful. Text mining begins as an algorithm to extract the facts available from a textual source and converts it to a figure that can be used to create a "hypotheses that are further explored by traditional data mining and data analysis methods." (Maimon; Rokach, 2005) in text parsing, the problems are encountered with hyponyms -- that is generalization of information and thus the contributor -- 'Human' and his positions -- corporate executive, and other features may in general be a casual information, but also a vital information. Thus the information of a general nature would be normally ignored because the span and token of the program does not consider this information. The same may be vital information when viewed from a different perspective. (Srivastava; Sahami, 2009)

To this end in text mining, the major operation is tagging, and the component of a text mining program can tag the document using a statistical tagging, or semantic tagging and this is the basis of arriving at any new information. There are requirements for managers to find information from a new angle and this is often found in customer responses that need not be structured. This thus is based on a task-oriented preprocessing approach to find the method of creating a structured document from an unstructured one. Another method called the 'Text Mining and Information Extraction' is used to summarize a document. In any case, the text mining operations form the base of tagging and thus create entities and relationships. (Maimon; Rokach, 2005) There are many researches underway to create better algorithms. One research that was carried out showed the possibilities of "implementation of information extraction and categorization in the text mining." (Mustafa; Akbar; Sultan, 2009)

The aim of text mining is to provide a method for knowledge management, analysis and decision-making. There are numerous methods and functions that go into the creation of text matter parsing and the 'text mining' has a lot of functions which are combined to create a 'text mining' algorithm. Some of the mining activities include extracting information after a comprehensive search that results in categorization, and the extracted data set is then summarized and used for monitoring and answering questions based on the need. The fundamental requisite from a text mining operation is to get an associative distribution for words and terms and find a common significance that can be used for some research or for business forecast needs. (Mustafa; Akbar; Sultan, 2009)

The most important part is the information extraction. This means that the process of identifying words or feature terms from within a textual file is attempted and these are then processed through a 'layered model of the Text Mining Application.' (Mustafa; Akbar; Sultan, 2009) the text and data mining have the same analytical functions but differ in the use of natural language -- NL and information retrieval -- IR techniques. (Maimon; Rokach, 2005) the procedures that go into the process of text mining are numerous and deserve a special discussion because the method of mining is common to all algorithms in data mining and text mining.

Processes:

The processes differ slightly between the data mining and the text mining because the text mining is envisaged for unordered data. This makes the basis of the search different. The typical process followed in this is stemming, that is identifying the root of a certain word. The stemming techniques are of two types called the inflectional and derivational. The stem is a very useful concept because the roots manage to avoid the singular, plural and other nuances and help in reducing the data to bare essentials. The size of the dictionary will be thus kept to the minimum and stems and token help keep the accuracy of the information extracted. Thus these two concepts aid in faster and shorter algorithms that develop data from random text matter. Documents will then be classified according to their threads or common contents and this grouping along with the use of identical roots or stems and tokens for the other words that have connections with the root help in finding a feature (Weiss, 2005)

Derivational stemming is where a new word is created from an existing root. This though is interesting; the practical application is with the 'Inflectional Stemming.' The algorithm used is the 'Porter's Algorithm' for stemming. This is where the parsing is done with the elements of the language and its grammar, like plural, singular, present, past and the other grammatical syntax. (Mustafa; Akbar; Sultan, 2009) the inclusion of data mining provides a method of extracting data. However the data is not all in forms. Thus the methods envisaged have helped parse some of the text. We cannot find an algorithm that can anticipate all human communications because of its complexity. Thus text mining has a lot of problems unlike data mining because of the differences in language, method of using the language and also the difference of expression from one individual to another. Words thus may mean different things at different contexts. (Mitra; Acharya, 2003)

The process of categorization is to pin point the category of the domain in use, and combined with a token, it results in allocation of the text to the best category and this is done using the table managing algorithms called the 'Hash Tables.' (Mustafa; Akbar; Sultan, 2009) These procedures are unique to the text mining process because it works with unstructured data using a domain dictionary which has the set of terms that has to be exhaustive for the mining to be effective. Text data is in a compressed form and in the future accessing the text data will be a problem because of the need of algorithms for decompression along with the search. Text data bases are compressed using Lempel -- ziv type algorithms and the similar algorithm is used in data mining and text mining to retrieve the matter efficiently. The greatest source of text is the web, and the mining also is thus related to the web largely. (Mitra; Acharya, 2003)

One method of text mining that was proposed was called 'DISCOTEX -- Discovery from Text EXtraction,' which used a system, and a 'standard rule induction module.' By extracting information it is possible to create a well structured, searchable database that makes the online text more easily accessible. Another algorithm that can be mentioned is the APRIORI a standard association rule mining algorithm and both combined have been claimed to find interesting patterns from book descriptions. (Daelemans; Plessis; Snyman, Teck, 2005)

Not only single words, but also strings can be mined. The analysis of similarities in whole strings also is in the ambit of text mining. The aim of the exercise overall is to achieve information integration and this is achieved when there can be established an optimal correspondence between variables such that some factor can be associated… READ MORE

Quoted Instructions for "Text Mining" Assignment:

text mining (definition, applications, technologies, researchers,...)

How to Reference "Text Mining" Term Paper in a Bibliography

Text Mining.” A1-TermPaper.com, 2011, https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122. Accessed 27 Sep 2024.

Text Mining (2011). Retrieved from https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122
A1-TermPaper.com. (2011). Text Mining. [online] Available at: https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122 [Accessed 27 Sep, 2024].
”Text Mining” 2011. A1-TermPaper.com. https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122.
”Text Mining” A1-TermPaper.com, Last modified 2024. https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122.
[1] ”Text Mining”, A1-TermPaper.com, 2011. [Online]. Available: https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122. [Accessed: 27-Sep-2024].
1. Text Mining [Internet]. A1-TermPaper.com. 2011 [cited 27 September 2024]. Available from: https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122
1. Text Mining. A1-TermPaper.com. https://www.a1-termpaper.com/topics/essay/mining-concept-text/377122. Published 2011. Accessed September 27, 2024.

Related Term Papers:

Data Mining in Business Research Executive Summary Term Paper

Paper Icon

Data Mining in Business Research
Executive Summary
The perfect storm is brewing over analytics and their use to define the
future of data mining and its associated techniques and technologies,… read more

Term Paper 4 pages (1160 words) Sources: 3 Style: MLA Topic: Computers / IT / Internet


Data Mining Thesis

Paper Icon

Data Mining

Evaluating Data Mining as a Strategic Technology

The ability to quickly gain insights from a diverse and often incompatibles set of databases and data sets are possible when… read more

Thesis 10 pages (3527 words) Sources: 8 Topic: Computers / IT / Internet


Data Mining in Healthcare Information Systems Case Study

Paper Icon

Data Mining in Healthcare Information Systems

Case Study of a Veterans' Administration Spinal Cord Injury Population

The business problem that initiated the use of Business Intelligence and the original goal… read more

Case Study 2 pages (579 words) Sources: 2 Topic: Computers / IT / Internet


Consumer Piracy Research Paper

Paper Icon

Consumer Privacy: Regulations and Ethics: Judging Privacy Concerns and Protections

Consumer Privacy and Data Mining:

Ethical Implications for Marketers in the 21st Century

The ethical implications of using customer data… read more

Research Paper 6 pages (2099 words) Sources: 4 Topic: Advertising / Marketing / Sales


Diamonds and Their Production Prospecting Mining Natural Sources Term Paper

Paper Icon

Diamonds are among the most precious and valuable gems in the world and mining organizations understandably endure a great deal of trouble to locate and unearth them. Most of the… read more

Term Paper 2 pages (704 words) Sources: 5 Style: MLA Topic: African History / Africa


Fri, Sep 27, 2024

If you don't see the paper you need, we will write it for you!

Established in 1995
900,000 Orders Finished
100% Guaranteed Work
300 Words Per Page
Simple Ordering
100% Private & Secure

We can write a new, 100% unique paper!

Search Papers

Navigation

Do NOT follow this link or you will be banned from the site!