Technically Totem evolves a lexicon and grammar by taking audio recordings of its environment, running them through speech recognition, tagging the parts of speech in the recognized phrases, and storing the resulting grammatical structures.

After each recording the file is sent to Sphinx4 (Java) for speech recognition and Sox to edit the recording into chunks based on word timings. A map of each recording of a spoken word and its interpretation is saved. These words are then sent to NLTK (Python) for part of speech tagging and maps of the words, their assigned parts of speech, and the larger chains of grammatical structure are saved.

Meanwhile in a parallel process Totem begins to speak. Reflecting on what has been said, what grammatical structures and words are popular, Totem attempts to make new sentences by mapping parsed grammars to words that might fit, and speaking the result using fragments of overheard sound.

When the audio process is complete Totem once again listens in, and the cycle repeats.

Flowchart of Totem's learning process

Graphical depiction of Totem's part of speech and grammatical structure networks