Introduction
Entropy is a measure of disorder, or more precisely unpredictability. For example, a series of coin tosses with a fair coin has maximum entropy, since there is no way to predict what will come next. A string of coin tosses with a coin with two heads and no tails has zero entropy, since the coin will always come up heads. Most collections of data in the real world lie somewhere in between. It is important to realize the difference between the entropy of a set of possible outcomes, and the entropy of a particular outcome. A single toss of a fair coin has an entropy of one bit, but a particular result (e.g. "heads") has zero entropy, since it is entirely "predictable".
English text has fairly low entropy. In other words, it is fairly predictable. Even if we don't know exactly what is going to come next, we can be fairly certain that, for example, there will be many more e's than z's, or that the combination 'qu' will be much more common than any other combination with a 'q' in it and the combination 'th' will be more common than any of them. Uncompressed, English text has about one bit of entropy for each byte (eight bits) of message. [ citation needed ]
If a compression scheme is lossless—that is, you can always recover the entire original message by uncompressing—then a compressed message has the same total entropy as the original, but in fewer bits. That is, it has more entropy per bit. This means a compressed message is more unpredictable, which is why messages are often compressed before being encrypted. Roughly speaking, Shannon's source coding theorem says that a lossless compression scheme cannot compress messages, on average, to have more than one bit of entropy per bit of message. The entropy of a message is in a certain sense a measure of how much information it really contains.
Shannon's theorem also implies that no lossless compression scheme can compress all messages. If some messages come out smaller, at least one must come out larger. In the real world, this is not a problem, because we are generally only interested in compressing certain messages, for example English documents as opposed to random bytes, or digital photographs rather than noise, and don't care if our compressor makes random messages larger.
最初定義
信息理論的鼻祖之一Claude E. Shannon把信息(熵)定義為離散隨機(jī)事件的出現(xiàn)概率。所謂信息熵,是一個(gè)數(shù)學(xué)上頗為抽象的概念,在這里不妨把信息熵理解成某種特定信息的出現(xiàn)概率。
對(duì)于任意一個(gè)隨機(jī)變量 X,它的熵定義如下:變量的不確定性越大,熵也就越大,把它搞清楚所需要的信息量也就越大。
信息熵是 信息論 中用于度量信息量的一個(gè)概念。一個(gè)系統(tǒng)越是有序,信息熵就越低;反之,一個(gè)系統(tǒng)越是混亂,信息熵就越高。所以,信息熵也可以說(shuō)是系統(tǒng)有序化程度的一個(gè)度量。
Named after Boltzmann's H-theorem , Shannon denoted the entropy H of a discrete random variable X with possible values { x 1 , ..., x n } as,
Here E is the expected value , and I is the information content of X .
I ( X ) is itself a random variable. If p denotes the probability mass function of X then the entropy can explicitly be written as
where b is the base of the logarithm used. Common values of b are 2, Euler's number e , and 10, and the unit of entropy is bit for b =2, nat for b = e , and dit (or digit) for b =10. [ 3 ]
In the case of p i =0 for some i , the value of the corresponding summand 0log b 0 is taken to be 0, which is consistent with the limit :
The proof of this limit can be quickly obtained applying l'H?pital's rule .
計(jì)算公式
H(x)=E[I(xi)]=E[ log(1/p(xi)) ]=-∑p(xi)log(p(xi)) (i=1,2,..n)
具體應(yīng)用
示例
1、香農(nóng)指出,它的準(zhǔn)確信息量應(yīng)該是 = -(p1*log p1 + p2 * log p2 + ... +p32 *log p32),其中,p1,p2 , ...,p32 分別是這 32 個(gè)球隊(duì)奪冠的概率。香農(nóng)把它稱(chēng)為“信息熵” (Entropy),一般用符號(hào) H 表示,單位是比特。有興趣的讀者可以推算一下當(dāng) 32 個(gè)球隊(duì)奪冠概率相同時(shí),對(duì)應(yīng)的信息熵等于五比特。有數(shù)學(xué)基礎(chǔ)的讀者還可以證明上面公式的值不可能大于五。
2、在很多情況下,對(duì)一些隨機(jī)事件,我們并不了解其概率分布,所掌握的只是與隨機(jī)事件有關(guān)的一個(gè)或幾個(gè)隨機(jī)變量的平均值。例如,我們只知道一個(gè)班的學(xué)生考試成績(jī)有三個(gè)分?jǐn)?shù)檔:80分、90分、100分,且已知平均成績(jī)?yōu)?0分。顯然在這種情況下,三種分?jǐn)?shù)檔的概率分布并不是唯一的。因?yàn)樵谙铝幸阎獥l件限制下p1*80+p2*90+p3*100=90,P1+p2+p3=1。有無(wú)限多組解,該選哪一組解呢?即如何從這些相容的分布中挑選出“最佳的”、“最合理”的分布來(lái)呢?這個(gè)挑選標(biāo)準(zhǔn)就是最大信息熵原理。
按最大信息熵原理,我們從全部相容的分布中挑選這樣的分布,它是在某些約束條件下(通常是給定的某些隨機(jī)變量的平均值)使信息熵達(dá)到極大值的分布。這一原理是由楊乃斯提出的。這是因?yàn)樾畔㈧厝〉脴O大值時(shí)對(duì)應(yīng)的一組概率分布出現(xiàn)的概率占絕對(duì)優(yōu)勢(shì)。從理論上可以證明這一點(diǎn)。在我們把熵看作是計(jì)量不確定程度的最合適的標(biāo)尺時(shí),我們就基本已經(jīng)認(rèn)可在給定約束下選擇不確定程度最大的那種分布作為隨機(jī)變量的分布。因?yàn)檫@種隨機(jī)分布是最為隨機(jī)的,是主觀成分最少,把不確定的東西作最大估計(jì)的分布。
3 Data as a Markov process
A common way to define entropy for text is based on the Markov model of text. For an order-0 source (each character is selected independent of the last characters), the binary entropy is:
where p i is the probability of i . For a first-order Markov source (one in which the probability of selecting a character is dependent only on the immediately preceding character), the entropy rate is:
where i is a state (certain preceding characters) and p i ( j ) is the probability of j given i as the previous character.
For a second order Markov source, the entropy rate is
In general the b -ary entropy of a source = ( S , P ) with source alphabet S = { a 1 , ..., a n } and discrete probability distribution P = { p 1 , ..., p n } where p i is the probability of a i (say p i = p ( a i )) is defined by:
Note: the b in " b -ary entropy" is the number of different symbols of the "ideal alphabet" which is being used as the standard yardstick to measure source alphabets. In information theory, two symbols are necessary and sufficient for an alphabet to be able to encode information, therefore the default is to let b = 2 ("binary entropy"). Thus, the entropy of the source alphabet, with its given empiric probability distribution, is a number equal to the number (possibly fractional) of symbols of the "ideal alphabet", with an optimal probability distribution, necessary to encode for each symbol of the source alphabet. Also note that "optimal probability distribution" here means a uniform distribution : a source alphabet with n symbols has the highest possible entropy (for an alphabet with n symbols) when the probability distribution of the alphabet is uniform. This optimal entropy turns out to be .
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
