Awesome Worlds: languages

Showing posts with label languages. Show all posts

Tuesday, January 24, 2017

Fantasy word generator and generated novels

In yesterday's post, I mentioned stumbling across a small fantasy map generator. There's also a word generator associated with it, used to create place names. The approach in that word generator is based upon patterns (e.g. CVC for consonant-vowel-consonant) and partially-overlapping sets of letter/sound types (consonants, vowels, sibilants, ending). This is a different approach than taken in the word generator I mentioned in a couple of posts last fall - that used statistical patterns derived from a corpus of words used as input.

Apparently, both the fantasy map and word generators are ports from Python to JavaScript of code used in NaNoGenMo 2015, the National Novel Generation Month. That particular effort was known as The Deserts of the West and produced not so much a novel as an atlas and travel guide, along the lines of a gazetteer. The concept of NaNoGenMo appears to be analogous to NaNoWriMo, the National Novel Writing Month. NaNoWriMo promotes writing a novel in a month; NaNoGenMo promotes writing code within a month that can generate a novel.

When I discovered NaNoGenMo yesterday evening, I investigated a little, and the novels produced by NaNoGenMo aren't particularly interesting to me in a literary sense. (A few were better than some of what I was subjected to in college English classes, though.) I do not think actual authors have anything to fear from the output of these programs. However, they were of some technical interest. They did produce proper paragraphs and sentences. Many of them were even fairly coherent. I just didn't find any particularly interesting as reading material.

On the other hand, I find the gazetteer approach of The Deserts of the West fascinating. That is, both the concept and the output intrigue me. I may investigate them more thoroughly at some point. In a way, they seem to be nothing more than an expansion on some of the tools and tables used in role-playing games, be they pen and paper or computer.

OK, that's it for my lunchtime post. Time to finish eating and get back to work.

Sunday, September 18, 2016

Sunday night stuff

Much of the work outstanding work on the star system generator that I discussed in yesterday's post is now finished. Some data is simply not generated for gas worlds - you'll see NaN (not a number) showing up in those cases.

The moon count is still low for the gas giants, but that's because the method in the simulation by which planetesimal's become moons makes it unlikely for large numbers to accrue around any planet. I may need to add some code to specifically generate additional moons for the larger gas worlds. A few bits and pieces remain to be taken care of - there's a bug that can result in negative mass(!?) for small planets around very massive stars.

There's more work to be done. The orbital data for the moons is incorrect/incomplete. The GUI and the simulation core need to be separated out into an application and a library, as I have every intention of reusing the simulation core with the planet program in the future, and with other things. The GUI needs an export capability. It also needs an option to restrict the star selection to a smaller selection of spectral types; currently it will randomly select from any main sequence type. Another option should be to keep creating star systems until one is found with a habitable world - that'll make it easier for authors, game masters, game designers, etc. who might try to use this to generate a star system with a habitable world for a story or game.

Code cleanup will probably require a couple days - the original examples I ported from and/or used as a template were in multiple computer languages with different naming conventions, indent styles, bracketing styles, comment styles, etc. Where I copy-and-pasted I need to go back and achieve consistency.

On other matters, I ran across an interesting agent-based language evolution simulator. Its implemented in JavaScript and runs in the browser, so just visit the link if you want to take it for a spin. I've included a screenshot below. At first glance, it appears to add words from language A to language B when individuals from A and B interact, with possible mutations. In and of itself, merely interesting, but when combined with some other techniques it could be useful. I'll have to take a more in-depth look again when I jump back to language synthesis at some point. On a related note, I should probably finish cleanup of my C# port of LangGen and put that up on GitHub sometime soon.

Last night I found out that a new novel in Alma T. C. Boykin's Colplatschki Chronicles series had been released. I purchased Forcing the Spring from Amazon and read it on my aging Nook. I quite enjoyed it, and will likely write up a review at some point in the near future. I eagerly await the next book in the series.

Currently I am reading Paul Lendvai's The Hungarians: A Thousand Years of Victory in Defeat. I ran across this book in downtown Cincinnati late Friday afternoon, when I stopped in at the main branch of the Public Library of Cincinnati and Hamilton County. Alas, I had arrived via streetcar, and had to cart the book and two others I checked out through much of downtown Cincinnati, including a stop at crowded Findlay Market, before I could finally drop them at my vehicle.

There was a bizarre bit of convergence after I left the library and hopped the streetcar. I was reading the book on the streetcar, and from early on Lendvai makes mention of German settlements within Hungary. I look up from my reading at one of the stops, and there are people in old-timey German attire boarding. I was puzzled for a few seconds, and then remembered - Oktoberfest! Cincinnati's Oktoberfest, the largest in America, was starting in an hour or so, and people were boarding the streetcar to reach it.

Between the weather on Friday and Saturday, the crowds, not being a big beer drinker, and the fact that most food booths were from area restaurants I can get the same food from anytime, I wasn't really inspired to go to Oktoberfest. It was still rather neat seeing so many people getting into it and dressing up for event.

And that's it for this Sunday. I'll probably have more to write about come tomorrow.

Saturday, September 3, 2016

Language generation - 2

In my previous post about language generation I mentioned that sample languages using characters beyond just a-z weren't being processed properly. Investigation showed that two portions of the code had problems and needed to be altered. The first was that the encoding being used for reading the sample files was stripping some characters; explicitly reading the sample files in using the Windows-1252 encoding resolved this problem. The second was in generating the permutations used for searching the sample for potential syllables. Only a-z were being used in creating the permutations, but characters in the set ôõöøùúûüýþÿáâãäåæçèéêëìíîïðñÞ all needed to be considered as well, just to allow the Norse samples to be processed properly. Altering the code in those two portions rapidly improved the generated words.

Another couple of measures were added. If insufficient vowels are present, the generated word is rejected and a new one is generated instead. If the number of consecutive consonants exceeds five, the word is also rejected. An example of output generated from a Norse sample is below.

gufr
ingdr
ardrrn
gudr
ásrn
sighindr
sigídr
gudídr
guríörg
sighinn
vandrarr
indr
sigrg
arnn
eildr
inídís
thgirr
siarr
bjöísrg
infr

Next up to add is automatic elimination of one character if a run of characters in the word involves the same letter repeating three consecutive times (e.g. if word contains "rrr", it gets changed to "rr"). Once that's in place I will likely perform a brief cleanup and commit it to GitHub. (And I need to check the license on the original langgen. As this is a direct port, I need to release it under that license. I remember it was relatively permissive but don't recall if it was BSD, MIT, LGPL, etc.) There will be an additional post here when that all is done.

Friday, September 2, 2016

Language generation

For a few different things I'm working on, I need (or at least want) automated generation of names/words that seem to be from a common language. There are programs that will do this; for example, I found that Federico Tomassetti produced a language generator using Python. However, Python is not the easiest language to integrate with some of the software I'm planning to use the language generator for. As a result, I've nearly complete a port of Tomassetti's work to C#.

How does it work? There are two distinct aspects, analysis and generation. The first aspect, analysis, involves the program reading a source sample and calculating statistics from it. The second aspect, generation, requires the statistical results from the analysis.

The analysis aspect has three basic steps. First, it reads in a word list from a source language. Next it produces the permutations of potential syllables based upon the letters from the alphabet (up to a limited length). Finally, it analyzes the word list to determine the frequency of the potential syllables, including the frequency they occur as starts or ends of words. This provides the statistical data that is needed for generation.

The generation aspect requires the statistical data from the analysis aspect. It has five basic steps. The first is to decide upon the number of syllables for the new word. The second step is to makes a probablistic choice of a starting syllable. In the third step, if the number of syllables is greater than two, a loop runs that makes probablistic selection(s) of additional syllable(s). Finally, a probablistic selection of an ending syllable is made. Finally, the selected syllables are combined in order to form a word.

There are a number of additional features I plan to add in the future, either as part of thus language generator or as something layered on top. Nevertheless, the port of langgen will serve as a good starting point. With a set of names from Celtic mythology it produced the list below, which I felt was acceptable.

suis
mata
aniois
brius
belona
belus
matiais
esveus
nerus
artirus
camviis
leio
aria
matiaus
caes
camvina
artios
caona
maos
canuus

Testing it, I think it needs some additional work to handle characters featuring umlauts, cedillas, accent symbols, etc. Why do I think this? Because when I use a source word list from a language featuring such characters (e.g. Norse, Polish), I get less desirable output, like so:

stdr
rarg
ragvall
inirrr
gudrndr
rgdr
gudrnn
guir
bjldrir
hieirn
raldr
arrr
ragarr
eirrra
frri
ridr
eildr
siglfrr
ragrr
rall

So adding support for those types of characters is my next priority. At some point I'll get it into GitHub and make it available.