By Petra Perner

This publication constitutes the refereed court cases of the sixth business convention on facts Mining, ICDM 2006, held in Leipzig, Germany in July 2006. offers forty five conscientiously reviewed and revised complete papers geared up in topical sections on facts mining in drugs, net mining and logfile research, theoretical points of knowledge mining, facts mining in advertising, mining signs and pictures, and features of knowledge mining, and purposes similar to intrusion detection, and extra.

**Extra resources for Advances in Data Mining: Applications in Medicine, Web Mining, Marketing, Image and Signal Mining: 6th Industrial Conference on Data Mining, ICDM 2006, Leipzig, Germany, July 2006, Proceedings**

**Example text**

26. : A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16 (1980) 111120. 27. H. : Evolution of protein molecules. N. ), Mammalian Protein Metabolism. Academic Press, NY, pp. 21-132, 1969. edu Abstract. This paper presents a data mining approach to estimate multispecies gene entropy by using a self-organizing map (SOM) to mine a homologous gene set. The gene distribution function for each gene in the feature space is approximated by its probability distribution in the feature space.

The total sampled feature space is the feature data of the super-gene, which is the combination of a set of homologous genes. After the gene probability density function p( x) in the sequence space ∑ is approximated by the gene distribution function p' ( x) on the SOM plane with k neurons for each gene, we can estimate entropy for a multispecies gene x as: k H ( x) = −∑ p' ( xi ) log p ' ( xi ) (9) i =1 In the actual entropy estimation, we compute gene entropy values by p' ( x) from the SOM mining on different sizes of SOM lattices several times.

X m , xi ∈ Γ , where Γ is an alphabet and Γ is the size of the alphabet, the Shannon entropy can be defined as |Γ| H ( x) = −∑ pi log pi (1) i =1 The pi is the probability of the occurrence of the i th symbol in the alphabet. If the alphabet is defined as a set of nucleotides : Γ = { A, T , C , G} , then H (x) describes the information of randomness or state of order conveyed by a DNA sequence. Because the single character based Shannon entropy analysis is far from sufficiency to explore the information conveyed by a DNA sequence [1,2], it is often generalized to a block entropy to investigate more structural information embedded in a DNA sequence; that is, | ∑| H n ( x) = −∑ pi( n ) log p i( n ) i =1 P.