I’m working on a new project. Read more at pirr.me or follow us on Instagram.
Personen bakom pseudonymen Elena Ferrante
2018 publicerades en studie där italienska forskare med hjälp av språkteknologiska metoder gör ett försök att identifiera personen bakom pseudonymen Elena Ferrante. I samband med detta gjorde jag ett framträdande i egenskap av jourhavande språkteknolog i PP3 för att prata om automatiska metoder för författarskapsbestämning. Idén bakom författarskapsbestämning (authorship attribution på engelska) är att alla människor har ett eget och unikt sätt att uttrycka sig på som påverkas av t.ex var vi bor, vår utbildningsnivå, könstillhörighet, kulturella influenser mm. Om vi kan ringa in en persons individuella språk så kan vi använda oss av den kunskapen för att avgöra om den aktuella personen står bakom ett visst verk eller ej.
Continue reading “Personen bakom pseudonymen Elena Ferrante”
Fake news at SLTC 2018
The seventh Swedish Language Technology Conference (SLTC) recently took place in Stockholm at Stockholm University. I was the moderator of a panel discussion on fake news and troll detection. Members of the panel were Maria Liakata from the University of Warwick, Staffan Truvé of Recorded Future, Leon Strømberg-Derczynski from the IT University of Copenhagen and Georgi Karadzhov from SiteGround Hosting.
Finding new words with trie data structure
Like the unworldly linguist that I am, I spent the election of 2018 making a bot that announces to the world everytime a new word occurs in Dagens Nyheter. You can find it on Twitter under the handle @nya_ord_i_dn. In addition to the word (which is tweeted as it occurs in the article), the bot provides a short concordance and a URL to the article in which it occurs.
teknikpopulist
— Nya ord i DN (@nya_ord_i_dn) October 5, 2018
Continue reading “Finding new words with trie data structure”
Political Pessimism
This summer, I did a study on political pessimism in Swedish political discourse for Dagens Nyheter. It was a collaboration between Jussi Karlgren at Gavagai, Lovisa Bergström at Dagens Nyheter and myself where we made an attempt to quantify tonality in political speeches.
The Written Work Corpus
The written work corpus is a manually created data set of named occurrences of written works (books) in Swedish news texts. It’s intended use is for named-entity recognition (NER) tasks. The data set consists of 175 articles from the culture section of Dagens Nyheter (DN Kultur).