banner
「云华」CloudSino

网络一隅/Net`Corner

愿我的祝福与你同在!
github
bilibili
zhihu
steam
misskey
follow
email

(Fake) Complete Chinese Pinyin Converter

Continuing from the previous text, there was an article titled "Writing of Chinese Pinyin - From Mastery to Not Being Able to Type It Out"

The Chinese Pinyin converter is relatively mature, but some obscure rules still have poor support (then just make one yourself

Currently, the best user experience is with Google Translate, which can achieve capitalization at the beginning of sentences, word segmentation, and is free without ads.

This article serves as a secondary conversion to connect with it (

Image

The following writing methods all follow the current writing “Chinese Pinyin Scheme” (PDF).


Special characters used in short pinyin:

Image

Image

  • zh → ẑ / Zh → Ẑ

  • ch → ĉ / Ch → Ĉ

  • sh → ŝ / Sh → Ŝ

  • ng → ŋ / ( NG → Ŋ ) 1

These few double letter combinations "ẑ, ŝ, ĉ" are easy to handle, but the following "ŋ" is a bit troublesome, such as:

  • 相安 "xiang'an"
  • 线杆 "xiangan"

Image

This is because the silent symbol is automatically added only before syllables starting with the finals "a/o/e" to prevent confusion, and not added before the initials.

When there is no initial before "i/u/ü", add "y/w" as the initial letter.

So, to distinguish whether it is a syllable ending with "ng" or a syllable ending with "n" and starting with "g".

Only when "ng" is not followed by a final [āáǎàaēéěèeōóǒòoīíǐìiūúǔùuüǖǘǚǜ], it is converted to "ŋ".

Or it should also be converted only when followed by initials [bpmfdtnlgkhjqxzcsr] (


Erhua

Image

Image

The "er" of erhua is generally displayed as a neutral tone, that is, without tone marks.
I privately checked Han Dian, and did not find a separate character for the neutral tone "er".

Let's just replace all:

  • "(there is a space here) er" → "r"

Code section:

It can be seen that it is not difficult to implement, just write a regular expression and replace everything.

HTML
  
<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Real-time Preview</title>
    <style>
        body {
            margin: 0;
            padding: 0;
            font-family: 'SimHei', sans-serif;
            display: flex;
            height: 100vh;
            background-color: #f5f5f5;
        }
        .container {
            display: flex;
            width: 100%;
        }
        .textarea-container {
            width: 50%;
            height: 100%;
        }
        textarea {
            width: 100%;
            height: 100%;
            border: none;
            padding: 10px;
            font-size: 16px;
            resize: none;
            box-sizing: border-box;
            outline: none;
            font-family: inherit; /
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="textarea-container">
            <textarea id="input" placeholder="Please enter pinyin here..."></textarea>
        </div>
        <div class="textarea-container">
            <textarea id="output" placeholder="Display converted content..." readonly></textarea>
        </div>
    </div>

    <script>
        const input = document.getElementById('input');
        const output = document.getElementById('output');

        function transformText(text) {

            text = text.replace(/Zh/g, 'Ẑ')
                       .replace(/zh/g, 'ẑ')
                       .replace(/Ch/g, 'Ĉ')
                       .replace(/ch/g, 'ĉ')
                       .replace(/Sh/g, 'Ŝ')
                       .replace(/sh/g, 'ŝ')
                       .replace(/ er/g, 'r'); 

            text = text.replace(/ng(?![āáǎàaēéěèeōóǒòoīíǐìiūúǔùuüǖǘǚǜ])/g, 'ŋ');

            return text;
        }

        input.addEventListener('input', () => {
            const transformed = transformText(input.value);
            output.value = transformed;
        });
    </script>
</body>
</html>

https://wikidot.eu.org/tool.html
The single-page tool is hung here.


Not used:

"ê" and those nasal letters are mostly used in spoken language, and rarely in written language, so please allow the author to skip over them.

Footnotes#

  1. Google Translate automatically capitalizes the first letter of a sentence, "ng" is generally used as a final, so this rule is not applied.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.