ML System Data Extraction and Trainingįor data extraction and learning the model parameters, I followed the same approach as described in the Roman-Indic Transliteration post. First the source script is converted to WX and then WX is converted to the target script. Using this WX as bridge we convert a string from source script to target script. Check WX conversions of various Indic scripts from Indic-WX-Converter. There is a single ISCII to ASCII table for WX conversion and a seperate table for Unicode to ISCII for each Indic script. Internally WX maps the letters of Indic scripts to a common representation in ISCII (refer to page 15-17 of iscii91.pdf) and then maps this ISCII to ASCII which we call WX. But this is not the case with all Indic scripts, for example, Hyderabad in Tamil (ஹைதெராபாத்) maps to hEweVrApAw in WX. For example, Hyderabad in Telugu (హైదరాబాద్), Malayalam (ഹൈദരാബാദ്) and Kannada (ಹೈದರಾಬಾದ್) all map to a common representation hExarAbAx in WX.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |