Quantcast
Channel: PayMoon贝明实验室
Viewing all articles
Browse latest Browse all 130

如何使用命令行创建utf8 utf8_general_ci 的DBCommand to create MySQL database with Character set UTF-8

$
0
0
If database name contains nonalphanumeric chars use "" to quote: CREATE DATABASE my-db CHARACTER SET utf8 COLLATE utf8_general_ci; When using in shell script quote the quotes with "\" mysql -p -e "CREATE DATABASE my-db` CHARACTER SET utf8 COLLATE utf8_general_ci;" Question: utf8_general_ci 是什么? 官方: For any Unicode character set, operations performed using the xxx_general_ci collation are faster than those for the xxx_unicode_cicollation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons forutf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages ß is equal to ss. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters. To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 11.1.8.7, “Examples of the Effect of Collation”): [crayon-573ec63b1baff143258945/] A difference between the collations is that this is true for utf8_general_ci: [crayon-573ec63b1bb08904897963/] Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order): [crayon-573ec63b1bb0e190585160/] MySQL implements language-specific collations for the utf8 character set only if the ordering with utf8_unicode_ci does not work well for a language. For example, utf8_unicode_ci works fine for German dictionary order and French, so there is no need to create special utf8collations. utf8_general_ci also is satisfactory for both German and French, except that ß is equal to s, and not to ss. If this is acceptable for your application, you should use utf8_general_ci because it is faster. If this is not acceptable (for example, if you require German dictionary order), use utf8_unicode_ci because it is more accurate. MySQL :: MySQL 5.7 Reference Manual :: 11.1.15.1 Unicode Character Sets http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html 其它一: utf8_general_ci是一个遗留的校对规则,不支持扩展,它仅能够在字符之间进行逐个比较。 这意味着utf8_general_ci校对规则进行的比较速度很快,但是与使用utf8_unicode_ci的校对规则相比,比较正确性较差。 However:utf8_unicode_ci比较准确,utf8_general_ci速度比较快。通常情况下 utf8_general_ci的准确性就够我们用的了,在我看过很多程序源码后,发现它们大多数也用的是utf8_general_ci,所以新建数据 库时一般选用utf8_general_ci就可以了 mysql中utf8_bin、utf8_general_ci、utf8_general_cs编码区别 - huanleyan的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/chenghuan1990/article/details/10078931 其它二: utf8_general_ci is a very simple — and on Unicode, very broken — collation, one that givesincorrect results on general Unicode text. What it does is:
  • converts to Unicode normalization form D for canonical decomposition
  • removes any combining characters
  • converts to upper case
This does not work correctly on Unicode, because it does not understand Unicode casing. Unicode casing alone is much more complicated than an ASCII-minded approach can handle. For example:
  • The lowercase of “ẞ” is “β”, but the uppercase of “β” is “SS”.
  • There are two lowercase Greek sigmas, but only one uppercase one; consider “Σίσυφος”.
  • Letters like “ø” do not decompose to an “o” plus a diacritic, meaning that it won’t correctly sort.
There are many other subtleties.
  1. utf8_unicode_ci uses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example: German letter ß (U+00DF LETTER SHARP S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".
utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in a wrong order.
  1. utf8_unicode_ci is generally more accurate for all scripts. For example, on Cyrillic block:utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are sorted not well.
The cost of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci. But that’s the price you pay for correctness. Either you can have a fast answer that’s wrong, or a very slightly slower answer that’s right. Your choice. It is very difficult to ever justify giving wrong answers, so it’s best to assume that utf8_general_ci doesn’t exist and to always use utf8_unicode_ci. Well, unless you want wrong answers. Source: http://forums.mysql.com/read.php?103,187048,188748#msg-188748  

Viewing all articles
Browse latest Browse all 130

Trending Articles