You don’t know the secret love Shanghai Chinese word three point principle

love Shanghai Chinese wordsegmentation principle, not for all the. Because the love Shanghai algorithm is not likely to reveal business secret, if you know, isn’t that much love for Shanghai N.

Of course, these are just a part of the

search engine to handle tens of billion the amount of page data in a unit of time, so the search engine has a Chinese thesaurus. For example, love Shanghai currently has about 90 thousand Chinese word, then the search engine can be on one hundred billion page analysis, according to Chinese classified thesaurus.

2, based on the statistics of Shanghai: love a word marked red reason: standard red word is usually a keyword, you search for "learning" word, love Shanghai it believes the "learning" as a key word, so the word "learning" is the standard red. Shanghai: love segmentation method based on statistical word segmentation.

1, based on understanding: fool, and less than or equal to 3 characters Chinese love Shanghai is not to cut the word, search for "big school".

in addition, segmentation principle: love Shanghai proprietary Thesaurus (indivisible) such as the outstanding figures (such as Mao Zedong) stars (such as Andy Lau) retrieval of words (such as: hard ticket).


maximum and minimum (maximum matching: MEICI can match to match; minimum matching: the word matching the stop, and then from another word, such as: love Shanghai start) search for "Hunan big school roof", a word segmentation algorithm to love Shanghai we see it as a black box. We passed a number of input keywords, to determine the segmentation algorithm according to the output of Shanghai love love Shanghai. Forward and reverse (forward: from front to back; reverse: after the move from (with) Hunan academy roof) method: positive Hunan academy roof (Liu Qiang earth method) positive method: Liu Qiang earth method. Method: the method of reverse Liu Qiang earth. In the words of "the earth" is not a word.


love Shanghai word there are three basic method

, 3, (based on the string segmentation: love Shanghai maximum segmentation method)

love Shanghai Chinese segmentation algorithm: refers to the search engine in order to meet the users, and in order to provide users of information demand to use algorithm.



