Archive

Here you will find the posters presented in all the previous Language @ Edinburgh lunches:

55th lunch: Thursday, 1st December 2016

54th lunch: Thursday, 20th October 2016

49th lunch: Thursday, 22nd October 2015

48th lunch: Thursday, 11th June 2015

47th lunch: Thursday, 30th April 2015

46th lunch: Thursday, 19th February 2015

45th lunch: Monday, 24th November 2014

44th lunch: Monday, 13th October 2014

43rd lunch: Wednesday, 18th June 2014

42nd lunch: Thursday, 20th March 2014

41st lunch: Friday, 7th February 2014

Josef Fruehwald
PPLS
Richard Stöckle-Schobel
PPLS
Alessandra Cervone
ILCC

40th lunch: Thursday, 28th November 2013

39th lunch: Thursday, 17th October 2013

38th lunch: Thursday, 13th June 2013

37th lunch: Thursday, 9th May 2013

36th lunch: Friday, 8th February 2013

35th lunch: Thursday, 29th November 2012

34th lunch: Friday, 5th October 2012

33rd lunch: Friday, 8th June 2012

32nd lunch: Thursday, 12th April 2012

31st lunch: Friday, 10th February 2012

30th lunch: Thursday, 1st December 2011

29th lunch: Friday, 7th October 2011

28th lunch: Friday, 10th June 2011

27th lunch: Wednesday, 6th April 2011

26th lunch: Friday, 4th February 2011

In the study of Richard J. et. al. (2007), the VOT differs between males and females in the English speaking environment. In English, there is a phenomenon that females produce longer VOTs than males for voiceless consonants (Koenig, 2000; Swartz, 1992; Robb, Gilbert, & Lerman, 2005; Wadnerker et al., 2006; Whiteside, et al., 2004; Whiteside & Irving, 1997; Whiteside & Marshall, 2001). Besides, in the study of Kang (2013), they found out that distinction exists among phrases (in the speech of male and female) and it is affected by the intonation. Therefore, we would like to assume that as an important independence, intonation would also affect the length of VOT. Thus, a hypothesis that in Mandarin Chinese, the difference will appear between men and women in voice feature is made. 15 men and 15 women aged from 18 to 30 years old participated in the experiment. In CV structure of Mandarin Chinese, the combination of stop consonants (/p/, /b/, /t/, /d/, /k/, and /g/) followed by the vowels /a/ and /u/ are all legal, but when applying to other vowels, some of the combinations are phonetically illegal. Therefore, words beginning with lingual stop consonants followed by the vowels /a/ and /u/ are selected to form the word list. Besides, all the combinations will be provided with four intonations. As we expected, result shows that female has longer VOT than male in both aspirated and unaspirated voiceless stops because female’s vocal fold is relatively stiffer than male’s.
How linguistic variation is limited is one of the basic questions of syntactic theory. The prevalent paradigms concerning the amplitude of possibility were delineated in the 20th century when Chomsky challenged the structuralist view that "languages can differ from each other without limit and in unpredictable ways", as articulated by Joos. According to Chomsky, if the infinite logical space of grammars was in no way constrained, language learning could not take place. Chomsky's solution was that a genetically encoded Universal Grammar must set the bounds for variation in human languages. This research re-examines the question of linguistic variation by making a 2-dimensional map of the space of grammars using a mathematical approach. Such a model reveals clear-cut edges that restrict the space of possibilities. This suggests that, while the structuralist view is logically untenable, Chomsky's idea is at odds with Occam's razor since logical constraints already give a sufficient account of the bounds of variation.
There is well documented, cross-linguistic evidence that /s/ variation can index sexual orientation and non-normative masculinity in both production and perception. The present study expands on previous research on monolingual speech perception with a cross-linguistic Matched-Guise study. We examine the extent to which English listeners associate /s/ variation with sexual orientation and gender typicality in languages that listeners may have little to no knowledge of. Raters heard speech in English, French, German, and Estonian, all manipulated by three /s/ levels and three pitch levels. Our findings show that English listeners rate higher pitch and more fronted /s/ with gayness and femininity not only in English but also in the languages which they have limited or no knowledge of. These findings raise the question of how indexical meaning is associated with grammatical knowledge.
The topic phenomenon in Archaic Chinese and Old English, based on Li & Thompson’s (1976) paper, the former being a typical topic-prominent (Tp) language while the latter being a subject-prominent (Sp) language with topic-prominent features, is analyzed with data found in Ælfric’s Lives of Saints (996-997AD) and The Records of the Grand Historian (around 104BC and also known as Shiji) in the project. Archaic Chinese has a larger proportion of using topic than Old English in the sentence. Making a comparison between these two languages and find the similarities between them is a breakthrough. Also, a new definition of topic is given in the dissertation, which mostly absorbs the advantages of the existing research of the related area.
Romance languages show a substantial degree of variation in prenominal versus postnominal adjective placement. In particular, the semantic differences between prenominal and postnominal adjectives has been studied extensively, both in theoretical and computational linguistics (Cinque, 1994; Bouchard, 1998; Alexiadou, 2001; Laenzlinger, 2005; Boleda, 2007; Vecchi, 2013). This previous work has focused on the relative order of a single noun and one or several adjectival modifiers. By contrast, in this work we investigate the distribution of adjectives in complex noun phrases which include additional postnominal modifers such as a prepositional phrase (PP). In such noun phrases, in principle, three word orders are possible, as illustrated by the following examples in Italian: (1) Adj N PP:  [ un importanteA compitoN [PP di matematica] ] 
(2) N Adj PP:  [ un compitoN importanteA [P P di matematica] ] 
 (3) N PP Adj:  [ un compitoN [P P di matematica] importanteA ] 'an importantA math homeworkN ' First, we quantitatively describe the distribution of these orders in Italian, based on the data extracted from a syntactically-annotated corpus. The statistical analysis of these new data reveals that the prenominal adjective position is more frequent in complex noun phrases than in simple noun phrases. We investigate this phenomenon more in depth for the case of Italian noun phrases with PP complements introduced by the preposition ‘di’. To this end, we collect a large number of cases of adjective variation from the Wikipedia corpus of Italian. Our initial findings suggest that the preference for prenominal adjective position is induced by the lexico-statistical properties of the N-PP phrases.
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.
Social class has traditionally been a major topic of interest in the fields of sociolinguistics and dialectology, in which socioeconomic status is regarded as a major source of language variation and change. However, studies of class mobility in variationist sociolinguistics are relatively sparse, the focus remaining largely on contrasts between static social class groups. The present study explores the relationship between class mobility and sociophonetic variation with an auditory analysis of two phonetic variables that are reported to be socially stratified in the context of urban speech in Scotland. These variables are (1) the glottal replacement of /t/ in coda or non-foot-initial onset positions, e.g. bu[ʔ], bu[ʔ]er, moun[ʔ]ain (c.f. Speitel & Johnstone 1983; Johnston 1997; Stuart-Smith 1999), and (2) the phonemic distinction of /w/ and /ʍ/, where sounds represented orthographically with wh (e.g. white, somewhere) are realised as [ʍ] (Stuart-Smith 2004: 61). Spontaneous speech is analysed from native speakers of Scottish English born in Edinburgh, aged 57-69 years, from three socioeconomic groups: Working Class (WC), Established Middle Class (EMC) and New Middle Class (NMC), the third category consisting of speakers who have experienced upward mobility over their lifetime. Patterns of realisation across the three socioeconomic groups indicate widespread glottalisation and the merging of /ʍ/ with /w/ in WC speech, while EMC speakers in comparison show a higher rate of the prestigious alveolar [tʰ] realisation of /t/ and variable retention of the [ʍ] realisation. Most striking in the results is that the upwardly mobile NMC group shows the highest production rate of the [tʰ] and [ʍ] variants. Thus, despite their arguably intermediate socioeconomic status, speakers from the NMC group exceed the proportion of overtly prestige variants observed for EMC speech. This result mirrors previous findings by Dickson and Hall-Lew (2015) of a NMC cross-over pattern in the realisation of non-prevocalic /r/ in Edinburgh. It is argued that this distinct pattern among upwardly mobile speakers reflects an ideology of linguistic prestige distinct from that of speakers from a stable socioeconomic background. These results extend previous findings of unique patterns of phonetic variation among upwardly mobile individuals, offering greater insight into the linguistic representation of evolving class identities.
Distributions over strings and trees can be represented by probabilistic regular languages, and this representation characterizes many models in natural language processing. Recently, several datasets have become available which represent compositional semantics as graphs, so it is natural to seek the equivalent of probabilistic regular languages for graphs. To this end, we survey three families of graph languages: Hyperedge Replacement Languages (HRL), which can be made probabilistic; Monadic Second Order Languages (MSOL), which support crucial closure properties of regular languages such as intersection; and Regular Graph Languages (RGL; Courcelle, 1991), a subfamily of both HRL and MSOL which inherits the desirable properties of each, and has not been widely studied or previously applied to NLP. Focusing on RGL, we give a new inclusion proof, provide the first concrete algorithm for grammar intersection and parsing, and demonstrate that RGL is expressive enough to represent some common semantic phenomena.
Romance languages show a substantial degree of variation in prenominal versus postnominal adjective placement. In particular, the semantic differences between prenominal and postnominal adjectives has been studied extensively, both in theoretical and computational linguistics (Cinque, 1994; Bouchard, 1998; Alexiadou, 2001; Laenzlinger, 2005; Boleda, 2007; Vecchi, 2013). This previous work has focused on the relative order of a single noun and one or several adjectival modifiers. By contrast, in this work we investigate the distribution of adjectives in complex noun phrases which include additional postnominal modifers such as a prepositional phrase (PP). In such noun phrases, in principle, three word orders are possible, as illustrated by the following examples in Italian: (1) Adj N PP:  [ un importanteA compitoN [PP di matematica] ] 
(2) N Adj PP:  [ un compitoN importanteA [P P di matematica] ] 
 (3) N PP Adj:  [ un compitoN [P P di matematica] importanteA ] 'an importantA math homeworkN ' First, we quantitatively describe the distribution of these orders in Italian, based on the data extracted from a syntactically-annotated corpus. The statistical analysis of these new data reveals that the prenominal adjective position is more frequent in complex noun phrases than in simple noun phrases. We investigate this phenomenon more in depth for the case of Italian noun phrases with PP complements introduced by the preposition ‘di’. To this end, we collect a large number of cases of adjective variation from the Wikipedia corpus of Italian. Our initial findings suggest that the preference for prenominal adjective position is induced by the lexico-statistical properties of the N-PP phrases.
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.
Social class has traditionally been a major topic of interest in the fields of sociolinguistics and dialectology, in which socioeconomic status is regarded as a major source of language variation and change. However, studies of class mobility in variationist sociolinguistics are relatively sparse, the focus remaining largely on contrasts between static social class groups. The present study explores the relationship between class mobility and sociophonetic variation with an auditory analysis of two phonetic variables that are reported to be socially stratified in the context of urban speech in Scotland. These variables are (1) the glottal replacement of /t/ in coda or non-foot-initial onset positions, e.g. bu[ʔ], bu[ʔ]er, moun[ʔ]ain (c.f. Speitel & Johnstone 1983; Johnston 1997; Stuart-Smith 1999), and (2) the phonemic distinction of /w/ and /ʍ/, where sounds represented orthographically with wh (e.g. white, somewhere) are realised as [ʍ] (Stuart-Smith 2004: 61). Spontaneous speech is analysed from native speakers of Scottish English born in Edinburgh, aged 57-69 years, from three socioeconomic groups: Working Class (WC), Established Middle Class (EMC) and New Middle Class (NMC), the third category consisting of speakers who have experienced upward mobility over their lifetime. Patterns of realisation across the three socioeconomic groups indicate widespread glottalisation and the merging of /ʍ/ with /w/ in WC speech, while EMC speakers in comparison show a higher rate of the prestigious alveolar [tʰ] realisation of /t/ and variable retention of the [ʍ] realisation. Most striking in the results is that the upwardly mobile NMC group shows the highest production rate of the [tʰ] and [ʍ] variants. Thus, despite their arguably intermediate socioeconomic status, speakers from the NMC group exceed the proportion of overtly prestige variants observed for EMC speech. This result mirrors previous findings by Dickson and Hall-Lew (2015) of a NMC cross-over pattern in the realisation of non-prevocalic /r/ in Edinburgh. It is argued that this distinct pattern among upwardly mobile speakers reflects an ideology of linguistic prestige distinct from that of speakers from a stable socioeconomic background. These results extend previous findings of unique patterns of phonetic variation among upwardly mobile individuals, offering greater insight into the linguistic representation of evolving class identities.
Distributions over strings and trees can be represented by probabilistic regular languages, and this representation characterizes many models in natural language processing. Recently, several datasets have become available which represent compositional semantics as graphs, so it is natural to seek the equivalent of probabilistic regular languages for graphs. To this end, we survey three families of graph languages: Hyperedge Replacement Languages (HRL), which can be made probabilistic; Monadic Second Order Languages (MSOL), which support crucial closure properties of regular languages such as intersection; and Regular Graph Languages (RGL; Courcelle, 1991), a subfamily of both HRL and MSOL which inherits the desirable properties of each, and has not been widely studied or previously applied to NLP. Focusing on RGL, we give a new inclusion proof, provide the first concrete algorithm for grammar intersection and parsing, and demonstrate that RGL is expressive enough to represent some common semantic phenomena.
The generation of comprehension-induced predictions affects both the timing and articulatory realization of spoken output (e.g., Drake, Schaeffler, & Corley, 2014). The current study investigates whether these effects are predicated on the phonological relationship between a predicted word and a picture-name. We elicited lexical predictions by acoustically presenting sentence-stems. Pictures were named in 4 conditions: match (picture-name fully matched the lexical prediction), onset-overlap (e.g., can-CAP), rime-overlap (e.g., can-TAN), and a control condition (acontextual picture naming). Articulation was captured via ultrasound tongue imaging. Articulatory patterns during the response latency period differed according to whether the picture-name matched the lexical prediction or not, but not according to the phonological relationship between the picture-name and the lexical prediction (i.e., onset-overlap did not differ from rime-overlap). This suggests that the speech-motor consequences of comprehension-elicited predictions may reflect generalized mismatch monitoring processes rather than the activation of fully-specified predictions within the speech production system.
It is often the case that two constructions in Language A appear to correspond to a single construction in Language B. How do bilingual speakers of these languages represent such possibilities? Evidence from structural priming suggests that bilinguals can share syntactic representations across languages (Hartsuiker et al., 2004). To investigate this, we used a syntactic priming paradigm where participants listened to a description of a picture in Gaelic (prime) and then stated if it matched the picture on screen. They would then describe a new picture in English. Prime sentences were manipulated to be either an active, baseline (noun phrase), or one of two types of Gaelic passive constructions. Our results revealed a significant effect of prime type with participants more likely to produce a passive description following a passive prime than a baseline. We also found no significant difference in priming effects between either passive prime type. Therefore, the results suggest that our participants had shared representations across the English passive and both forms of the Gaelic passive. We interpret these results in terms of the theory posited by Bernolet et al. (2013) who claim that higher proficient bilinguals are more likely to incorporate constructions into a single language-independent representation.
This paper introduces a novel form of parametric synthesis that uses context embeddings produced by the bottleneck layer of a deep neural network to guide the selection of models in a rich-context HMM-based synthesiser. Rich-context synthesis – in which Gaussian distributions estimated from single lin- guistic contexts seen in the training data are used for synthesis, rather than more conventional decision tree-tied models – was originally proposed to address over-smoothing due to averag- ing across contexts. Our previous investigations have confirmed experimentally that averaging across different contexts is in- deed one of the largest factors contributing to the limited quality of statistical parametric speech synthesis. However, a possible weakness of the rich context approach as previously formulated is that a conventional tied model is still used to guide selection of Gaussians at synthesis time. Our proposed approach replaces this with context embeddings derived from a neural network.
We study the task of movie script summarization, which we argue could enhance script browsing, give readers a rough idea of the script's plotline, and speed up reading time. We formalize the process of generating a shorter version of a screenplay as the task of finding an optimal chain of scenes. We develop a graph-based model that selects a chain by jointly optimizing its logical progression, diversity, and importance. Human evaluation based on a question-answering task shows that our model produces summaries which are more informative compared to competitive baselines.
The success of supervised deep neural networks (DNNs) in speech recognition cannot be transferred to zero-resource languages where the requisite transcriptions are unavailable. We investigate unsupervised neural network based methods for learning frame-level representations. Good frame representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword units for within- and across- speaker phonetic discrimination. We enhance the correspondence autoencoder (cAE) and show that it can transform Mel Frequency Cepstral Coefficients (MFCCs) into more effective frame representations given a set of matched word pairs from an unsupervised term discovery (UTD) system. The cAE combines the feature extraction power of autoencoders with the weak supervision signal from UTD pairs to better approximate the extrinsic task's objective during training. We use the Zero Resource Speech Challenge's minimal triphone pair ABX discrimination task to evaluate our methods. Optimizing a cAE architecture on English and applying it to a zero-resource language, Xitsonga, we obtain a relative error rate reduction of 35% compared to the original MFCCs. We also show that Xitsonga frame representations extracted from the bottleneck layer of a supervised DNN trained on English can be further enhanced by the cAE, yielding a relative error rate reduction of 39%.
Hashing has witnessed an increase in popularity over the past few years due to the promise of compact encoding and fast query time. In order to be effective hashing methods must maximally preserve the similarity between the data points in the underlying binary representation. The current best performing hashing techniques have utilised supervision. In this paper we propose a two-step iterative scheme, Graph Regularised Hashing (GRH), for incrementally adjusting the positioning of the hashing hypersurfaces to better conform to the supervisory signal: in the first step the binary bits are regularised using a data similarity graph so that similar data points receive similar bits. In the second step the regularised hashcodes form targets for a set of binary classi fiers which shift the position of each hypersurface so as to separate opposite bits with maximum margin. GRH exhibits superior retrieval accuracy to competing hashing methods.
We present the first truly streaming cross document coreference resolution (CDC) system. Processing infinite streams of mentions forces us to use a constant amount of memory and so we maintain a representative, fixed sized sample at all times. For the sample to be representative it should represent a large number of entities whilst taking into account both temporal recency and distant references. We introduce new sampling techniques that take into account a notion of streaming discourse (current mentions depend on previous mentions). Using the proposed sampling techniques we are able to get a CEAFe score within 5% of a non-streaming system while using only 30% of the memory.
We investigated whether an intensive one-week language course would influence cognitive functions. We tested auditory attention in 31 participants at the beginning and end of a one-week intensive Gaelic course and compared the results to 34 matched controls, who either followed their usual routines (passive controls, n=18) or participated in a course of comparable duration and intensity but not involving foreign language learning (active controls, n=16). There was no difference between the groups in any of the measures at the beginning of the course. At the end of the course, the language but not the control group showed a significant improvement in attention switching, independent of the age of participants (age range 18-75 years). The improvement was biggest in the beginners. Our results suggest that even a short period of intensive language learning can modulate attentional functions and that all age groups can benefit from this effect.
Semantic diversity in language has been found to increase processing costs on both a behavioural (RT) and neural basis, reflecting diversity within the mental representation required to process a concept, or narrative event. Semantic diversity refers to the range of associations and contexts of occurrence an event can be linked to (Coll-Florit & Gennari, 2011). Concept imageability has been linked to semantic diversity, with neuroimaging studies revealing differential activations across left frontal and temporal lobe regions for varying degrees of imageability in sentence comprehension (Rodríguez-Ferreiro, Gennari, Davies and Cuetos, 2011). Another aspect that has demonstrated similar behavioural effects due to semantic diversity is narrative event duration. Event duration in such narratives affects retrieval time in behavioural tasks and this has been attributed to greater semantic diversity for events of longer durations. However, little is known regarding the neural basis of representations for narrative event duration. This exploratory fMRI study aimed to investigate the neural mechanisms underlying semantic processing across the domains of imageabiltiy and event duration, and how semantic diversity may affect such processing across these domains.
The success of supervised deep neural networks (DNNs) in speech recognition cannot be transferred to zero-resource languages where the requisite transcriptions are unavailable. We investigate unsupervised neural network based methods for learning frame-level representations. Good frame representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword units for within- and across- speaker phonetic discrimination. We enhance the correspondence autoencoder (cAE) and show that it can transform Mel Frequency Cepstral Coefficients (MFCCs) into more effective frame representations given a set of matched word pairs from an unsupervised term discovery (UTD) system. The cAE combines the feature extraction power of autoencoders with the weak supervision signal from UTD pairs to better approximate the extrinsic task's objective during training. We use the Zero Resource Speech Challenge's minimal triphone pair ABX discrimination task to evaluate our methods. Optimizing a cAE architecture on English and applying it to a zero-resource language, Xitsonga, we obtain a relative error rate reduction of 35% compared to the original MFCCs. We also show that Xitsonga frame representations extracted from the bottleneck layer of a supervised DNN trained on English can be further enhanced by the cAE, yielding a relative error rate reduction of 39%.
Dictionaries almost universally list said as an adjective in cases like a true copy of (the) said document, and the fact that the is optional or even redundant for some speakers is broadly overlooked. Based on syntactic and semantic evidence, I argue that the said should be considered a complex determiner and that said qualifies as a determiner in its own right, which is unexpected from a word starting life as a past participle. Interestingly, the same phenomenon can be found in Spanish and German among other European languages. I show that calquing from French and Latin texts in the 14th century is almost certainly the source of the English innovation.I tentatively conclude that new members of the so-called ?closed classes? can be added directly through language contact in the written channel.
In this work I present evidence that speech produced spontaneously in a conversation is considered more natural than read prompts. I also explore the relationship between participants’ expectations of the speech style under evaluation and their actual ratings. In successive listening tests subjects rated the naturalness of either spontaneously produced, read aloud or written sentences, with instructions toward either conversational, reading or general naturalness. It was found that, when presented with spontaneous or read aloud speech, participants consistently rated spontaneous speech more natural - even when asked to rate naturalness in the reading case. Presented with only text, participants generally preferred transcriptions of spontaneous utterances, except when asked to evaluate naturalness in terms of reading aloud. This has implications for the application of MOS-scale naturalness ratings in Speech Synthesis, and po- tentially on the type of data suitable for use both in general TTS, dialogue systems and specifically in Conversational TTS, in which the goal is to reproduce speech as it is produced in a spontaneous conversational setting.
Effective verbal comprehension requires representations of word meanings and executive processes that regulate access to this knowledge in a context-appropriate manner. Neuropsychological studies indicate that these two elements depend on different brain regions and can be impaired independently. We investigated the neural basis of these functions using distortion-corrected fMRI. 19 healthy subjects were scanned while completing a synonym-judgement comprehension task with concrete and abstract words. Each judgement was preceded by a sentence cue that manipulated the executive control demands of the semantic judgement. On some trials, the cue was irrelevant to the judgement, placing maximum demands on executive control processes. On others, the cue placed the target word in a specific linguistic context, reducing the executive demands of selecting the context-appropriate meaning. A network of regions were involved in the task, including inferior frontal gyrus, inferior parietal cortex, posterior temporal regions and superior and ventral areas within the anterior temporal lobe. Further analysis revealed a triple dissociation within this semantic network. (1) inferior prefrontal cortex was most active when irrelevant cues were provided, indicating involvement in executive regulation of meaning and suppression of irrelevant information, (2) superior anterior temporal lobe was most active when cues were contextually relevant, suggesting a role in integrating word meaning with preceding context, (3) ventral anterior temporal lobe was strongly active for both types of cue, consistent with its role in context-invariant representations of meaning. These differing responses to contextual constraints align with neuropsychological and TMS data and indicate functional specialisation within the semantic network.
When do listeners form pragmatic interpretations about a speech signal when processing linguistic input? We use an exploratory eye- and mouse-tracking paradigm to investigate listener comprehension in a reliability judgement task. Participants viewed visual displays while listening to a speaker tell them which object to click on in order to gain a prize. The speaker was presented as being sometimes dishonest, and critical utterances were fluent (The treasure is behind the…) or disfluent (Um, the treasure is behind the…). Time course data showed that listeners were more likely to direct their gaze and move the mouse towards the distractor object during disfluent utterances, with effects emerging soon after the point of disambiguation. The results indicate that listeners can and do make rapid global judgements about a speaker’s reliability depending on how the linguistic information is conveyed.
The success of supervised deep neural networks (DNNs) in speech recognition cannot be transferred to zero-resource languages where the requisite transcriptions are unavailable. We investigate unsupervised neural network based methods for learning frame-level representations. Good frame representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword units for within- and across- speaker phonetic discrimination. We enhance the correspondence autoencoder (cAE) and show that it can transform Mel Frequency Cepstral Coefficients (MFCCs) into more effective frame representations given a set of matched word pairs from an unsupervised term discovery (UTD) system. The cAE combines the feature extraction power of autoencoders with the weak supervision signal from UTD pairs to better approximate the extrinsic task's objective during training. We use the Zero Resource Speech Challenge's minimal triphone pair ABX discrimination task to evaluate our methods. Optimizing a cAE architecture on English and applying it to a zero-resource language, Xitsonga, we obtain a relative error rate reduction of 35% compared to the original MFCCs. We also show that Xitsonga frame representations extracted from the bottleneck layer of a supervised DNN trained on English can be further enhanced by the cAE, yielding a relative error rate reduction of 39%.
HMM synthesis: we present a framework for separating each of the effects of modelling in turn to observe their independent effects.
This paper applies a dynamic sinusoidal synthesis model to statistical parametric speech synthesis (HTS). For this, we utilise regularised cepstral coefficients to represent both the static amplitude and dynamic slope of selected sinusoids for statistical modelling. During synthesis, a dynamic sinusoidal model is used to reconstruct speech. A preference test is conducted to compare the selection of different sinusoids for cepstral representation. Our results show that when integrated with HTS, a relatively small number of sinusoids selected according to a perceptual criterion can produce quality comparable to using all harmonics. A Mean Opinion Score (MOS) test shows that our proposed statistical system is preferred to one using mel-cepstra from pitch synchronous spectral analysis.
When comprehending sentences, we can sometimes predict what is likely to be mentioned next. Such prediction is thought to entail pre-activation of features of predictable words, but it is unclear which features are pre-activated and which are not. A visual world eye-tracking investigated whether listeners pre-activate phonological information when listening to highly constraining sentences. Participants heard sentences like ?The man was gathering honey, when he was stung by a bee ...?, and eye movements to pictures of the predictable word ?bee? , a phonological onset competitor ?bean?, and an unrelated object ?tiger? were compared. We predicted participants would be more likely to look at phonological competitors (bean) compared to unrelated pictures (tiger) if they pre-activated phonology of the predictable words. The results show a trend that partly fits the prediction, but some results are hard to explain. I hope to hear what you think may be happening.
People in conversation tend to align on the way they speak. Previous research suggests that the tendency to imitate each other?s behaviour plays a crucial role in establishing successful interactions and bonding with other people. There is now evidence that linguistic alignment results in increased group cohesiveness, relationship stability and more positive attitudes towards the conversational partner. In this study, we investigated whether interacting with an imitative partner leads to more positive ratings of the interaction and of the partner himself, as well as if it increases the tendency to cooperate. We found that people who were imitated rated their interaction as more smooth as compared to the group that was counter-imitated.
The success of supervised deep neural networks (DNNs) in speech recognition cannot be transferred to zero-resource languages where the requisite transcriptions are unavailable. We investigate unsupervised neural network based methods for learning frame-level representations. Good frame representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword units for within- and across- speaker phonetic discrimination. We enhance the correspondence autoencoder (cAE) and show that it can transform Mel Frequency Cepstral Coefficients (MFCCs) into more effective frame representations given a set of matched word pairs from an unsupervised term discovery (UTD) system. The cAE combines the feature extraction power of autoencoders with the weak supervision signal from UTD pairs to better approximate the extrinsic task's objective during training. We use the Zero Resource Speech Challenge's minimal triphone pair ABX discrimination task to evaluate our methods. Optimizing a cAE architecture on English and applying it to a zero-resource language, Xitsonga, we obtain a relative error rate reduction of 35% compared to the original MFCCs. We also show that Xitsonga frame representations extracted from the bottleneck layer of a supervised DNN trained on English can be further enhanced by the cAE, yielding a relative error rate reduction of 39%.
In this study we test how introversion-extroversion affects language and gesture use depending on whether the interlocutor is visible to the speaker. Adults described arrays of objects, half the time with a screen occluding their interlocutor and half the time with the interlocutor visible. When participants could not see their listener, they used more words, particularly concrete words and tended to gesture more. This difference was moderated by extroversion for gestures (i.e., extroverts gestured more when their interlocutor was occluded) but not for speech. We argue that visibility of a listener may influence task difficulty differentially according to extroversion, and may also impact how speakers rely on gestures in accessing the specific and concrete language they think listeners need when they can’t be seen.
We propose a model for Chinese poem generation based on recurrent neural networks which we argue is ideally suited to capturing poetic content and form. Our generator jointly performs content selection (“what to say”) and surface realization (“how to say”) by learning representations of individual characters, and their combinations into one or more lines as well as how these mutually reinforce and constrain each other. Poem lines are generated incrementally by taking into account the entire history of what has been generated so far rather than the limited horizon imposed by the previous line or lexical n-grams. Experimental results show that our model outperforms competitive Chinese poetry generation systems using both automatic and manual evaluation methods.
We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.
When people want to identify the causes of an event, assign credit or blame, or learn from their mistakes, they often reflect on how things could have gone differently. In this kind of reasoning, one considers a counterfactual world in which some events are different from their real-world counterparts and considers what else would have changed. Researchers have recently proposed several probabilistic models that aim to capture how people do (or should) reason about counterfactuals. We present a new model and show that it accounts better for human inferences than several alternative models. Our model builds on the work of Pearl (2000), and extends his approach in a way that accommodates backtracking inferences and that acknowledges the difference between counterfactual interventions and counterfactual observations. We present six new experiments and analyze data from four experiments carried out by Rips (2010), and the results suggest that the new model provides an accurate account of both mean human judgments and the judgments of individuals.
Languages spoken by larger groups are claimed to be less morphologically complex than those of smaller populations (Lupyan & Dale 2010), although the mechanism by which group size could have this effect is yet to be convincingly identified (Nettle 2012). One proposed candidate mechanism is the differing degrees of input homogeneity: in larger groups, the linguistic input is thought to be provided by a greater number of speakers, and this may hamper the acquisition, and hence cross-generational transfer, of more complex morphological features (Hay & Bauer 2007, Nettle 2012). I describe two experiments which aimed to rigorously assess this candidate mechanism, and ultimately find no evidence to support it. In the first experiment, 60 participants were trained on a morphologically-complex miniature language of Hungarian sentences, and their acquisition of its case system assessed. Two conditions were considered: one in which the aural input was provided by a single native speaker, and one in which it was provided by three speakers. No statistically-significant difference in the participants? acquisition of the case system was found. There was, however, some suggestion that more limited acquisition was due to the learners finding the training strings difficult to segment, and that this was more prevalent in the multiple-speaker condition. The second experiment therefore aimed to assess whether speech-stream segmentation was more difficult when the input was provided by three speakers compared to one (n=48), extending the work of Saffran et al. (1996), which demonstrated that adult learners can use distributional cues to determine word boundaries in continuous speech. Again, no evidence was found to support the proposal that an increase in the number of speakers who provide a leaner?s linguistic input affects their ability to acquire complex morphology.
​We present ParCor, a parallel corpus of texts in which pronoun coreference (reduced coreference in which pronouns are used as referring expressions) has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences in pronoun use between languages and ultimately for developing and testing informed SMT systems to address the problem of pronoun coreference in translation. At present, the corpus consists of a collection of parallel English-German documents from two different text genres – TED talks and EU Bookshop publications. All documents in the corpus have been manually annotated with respect to the type and location of each pronoun and, where relevant, its antecedent. Construction of the corpus is ongoing, with plans for additional genres and languages in the future.​
​Deferred uses of demonstratives have generally been thought of as non-standard or figurative cases of language use. Allyson Mount (2008) has argued instead that all demonstrative reference is determined by some object being the cognitive focus of all conversational participants. In this paper I argue that while this uniform theory of reference is able to account for cases of deferred demonstration where the demonstrative is used to directly refer to an object, it fails for cases of demonstrative use which resemble attributive descriptions. I propose that one way to accommodate these attributive uses of demonstratives into Mount’s uniform theory of reference is to consider them to be anaphoric on descriptive phrases which have been made contextually salient, and which link the demonstrated object to a class of possible referents.​
​Soames (1987, 2008) has provided one of the most influential arguments against unstructured propositions---i.e. propositions as sets of truth-supporting circumstances. He claims that the assumption of unstructured propositions in combination with the direct reference thesis (and some further innocent assumptions) leads to absurd conclusions. The aim of this paper is to show that Soames makes a mistake in his reductio by conflating assertoric content and semantic value. I suggest that this distinction leads to two distinct theses/assumptions with regards to direct reference and that neither of these theses can support Soames' argument. Finally, I will suggest that it might be worthwhile to try to formulate his argument with the assumption of rigidity instead of direct reference.
The tongue moves silently in preparation for speech. We analyse Ultrasound Tongue Imaging (UTI) data of these pre-speech to speech phases from five speakers, whose native languages (L1) are English (n = 3), German, and Finnish. Single words in the subjects' respective L1 were elicited by a standard picture naming task. Our focus is to automate the detection of speech preparation through the analysis of raw UTI probe-return data, here captured at 201 fps. We analyse these movements with a pixel difference method, which yields an estimate of the rate of change on a frame by frame basis. We describe typical time dependent pixel difference contours and report grand average contours for each of the speakers.
Languages spoken by larger groups are claimed to be less morphologically complex than those of smaller populations (Lupyan & Dale 2010), although the mechanism by which group size could have this effect is yet to be convincingly identified (Nettle 2012). One proposed candidate mechanism is the differing degrees of input homogeneity: in larger groups, the linguistic input is thought to be provided by a greater number of speakers, and this may hamper the acquisition, and hence cross-generational transfer, of more complex morphological features (Hay & Bauer 2007, Nettle 2012). I describe two experiments which aimed to rigorously assess this candidate mechanism, and ultimately find no evidence to support it. In the first experiment, 60 participants were trained on a morphologically-complex miniature language of Hungarian sentences, and their acquisition of its case system assessed. Two conditions were considered: one in which the aural input was provided by a single native speaker, and one in which it was provided by three speakers. No statistically-significant difference in the participants? acquisition of the case system was found. There was, however, some suggestion that more limited acquisition was due to the learners finding the training strings difficult to segment, and that this was more prevalent in the multiple-speaker condition. The second experiment therefore aimed to assess whether speech-stream segmentation was more difficult when the input was provided by three speakers compared to one (n=48), extending the work of Saffran et al. (1996), which demonstrated that adult learners can use distributional cues to determine word boundaries in continuous speech. Again, no evidence was found to support the proposal that an increase in the number of speakers who provide a leaner?s linguistic input affects their ability to acquire complex morphology.
Pitch prominence is highly variable in the four types of (dis)agreements identified by Pomerantz (1984), agreement, same assessment, downgrading and disagreement. Even though sociolinguists and discourse analysts have studied (pitch) prominence of disagreement and negation in some detail , they have not looked at its variation in the three other disagreement types. Looking at a 5:30h corpus of casual conversation between several MSc students, I have extracted all (dis)agreements according to Pomerantz’ criteria and analysed them in terms of pitch prominence (vowel length and f0-excursion) to see what conditions and what complicates the variation. This does not appear to be type of (dis)agreement but rather discourse function, and posits f0-excursion as a potential stylistic variable indexing expressivity (cf. Podesva on phonation type as a stylistic variable, 2007).
This study investigated what can be learned about synaesthesia from natural language processing and vice versa. In our study, we asked how synaesthetes experience colours for compound words (e.g., keyhole) and in doing this, we also tested how compound words might be processed in normal language use. Using an online colour selection task, 19 grapheme-colour synaesthetes could provide zero, one, or two colours for compound words. We varied the lexical frequency and semantic transparency of these compounds. High-frequency compounds were significantly more likely than low-frequency compounds to have only one colour, rather than two colours. This suggests that there are two different psycholinguistic strategies for processing compound words: high-frequency words are stored as wholes but low-frequency words are broken down into constituents (Kubitza, unpublished; Schreuder & Baayen, 1995). However, there was no effect of semantic transparency. Reports from participants also revealed greater complexity in synaesthetic word colouring than previous research on grapheme-colour synaesthesia has been able to capture. Our results show that synaesthetic colours vary meaningfully with linguistic measures and can be used to understand the nature of both synaesthesia itself and natural language processing in the general population.
We introduce a sparse kernel learning framework for the Continuous Relevance Model (CRM). State-of-the-art image annotation models linearly combine evidence from several different feature types to improve image annotation accuracy. While previous authors have focused on learning the linear combination weights for these features, there has been no work examining the optimal combination of kernels. We address this gap by formulating a sparse kernel learning framework for the CRM, dubbed the SKL-CRM, that greedily selects an optimal combination of kernels. Our kernel learning framework rapidly converges to an annotation accuracy that substantially outperforms a host of state-of-the-art annotation models. We make two surprising conclusions: firstly, if the kernels are chosen correctly, only a very small number of features are required so to achieve superior performance over models that utilise a full suite of feature types; and secondly, the standard default selection of kernels commonly used in the literature is sub-optimal, and it is much better to adapt the kernel choice based on the feature type and image dataset.
Creating a language-independent meaning representation would benefit many crosslingual NLP tasks. We introduce the first unsupervised approach to this problem, learning clusters of semantically equivalent English and French relations between referring expressions, based on their named-entity arguments in large monolingual corpora. The clusters can be used as language-independent semantic relations, by mapping clustered expressions in different languages onto the same relation. Our approach needs no parallel text for training, but outperforms a baseline that uses machine translation on a cross-lingual question answering task. We also show how to use the semantics to improve the accuracy of machine translation, by using it in a simple reranker.
Languages spoken by larger groups are claimed to be less morphologically complex than those of smaller populations (Lupyan & Dale 2010), although the mechanism by which group size could have this effect is yet to be convincingly identified (Nettle 2012). One proposed candidate mechanism is the differing degrees of input homogeneity: in larger groups, the linguistic input is thought to be provided by a greater number of speakers, and this may hamper the acquisition, and hence cross-generational transfer, of more complex morphological features (Hay & Bauer 2007, Nettle 2012). I describe two experiments which aimed to rigorously assess this candidate mechanism, and ultimately find no evidence to support it. In the first experiment, 60 participants were trained on a morphologically-complex miniature language of Hungarian sentences, and their acquisition of its case system assessed. Two conditions were considered: one in which the aural input was provided by a single native speaker, and one in which it was provided by three speakers. No statistically-significant difference in the participants? acquisition of the case system was found. There was, however, some suggestion that more limited acquisition was due to the learners finding the training strings difficult to segment, and that this was more prevalent in the multiple-speaker condition. The second experiment therefore aimed to assess whether speech-stream segmentation was more difficult when the input was provided by three speakers compared to one (n=48), extending the work of Saffran et al. (1996), which demonstrated that adult learners can use distributional cues to determine word boundaries in continuous speech. Again, no evidence was found to support the proposal that an increase in the number of speakers who provide a leaner?s linguistic input affects their ability to acquire complex morphology.
The conventional wisdom regarding phonologization is that it progresses as a sequence of gradual reanalyses: natural acoustic, physiological and perceptual phenomena are reanalyzed as gradient coarticulatory processes, which are then reanalyzed as categorical phonological processes (Ohala, 1981; Bermudez-Otero, 2007). I argue that this model of gradual and gradient reanalyses is not well supported by available data on sound change in progress. In fact, based on analyses of the rate of change of multiple vowel variants, and in investigations of mismatches between the predictions based on phonetic versus phonologi- cal grounds, it appears that new phonological processes enter the grammar at the onset of phonetic changes, rather than as later stage reanalyses of phonetic changes in progress.
One part of the long debate about the nature of concepts has been dominated by the disputes between Conceptual Atomists and Conceptual Holists. A third, middle-ground position, Molecularism, has neither been debated as much nor has it been thoroughly defined yet. I will present two possible ways of construing Molecularism about concepts and I will argue that both are variations of the more commonly held views. To support this view, I will offer to metaphor-based reconstructions of Molecularism – Chemical Molecularism (CheM) and Cluster Molecularism (CluM). CheM is the view that some concepts are constructed from more primitive concepts, which, by virtue of their individual meanings and their combination, provide the meaning of the 'molecular' concept. This view relies on Atomist premises and faces some of the same problems as Conceptual Atomism. CluM, on the other hand, is a weak kind of Holism that is based on the idea that there are clusters of concepts that have strong relations (e.g. inferential relations, thematic groupings, or family resemblances), which are connected by more general concepts or by weaker links between clusters. CluM still has to answer to some worries Holism faces, such as the problem of Communication. I will end by proposing that CluM is preferable, based on a speculative idea about the relation between concepts and webs of belief.
Previous work addressing the automatic detection of opinion and quotation Attribution Relations (ARs) has looked at the cue, the lexical anchor connecting the attributed text to its source, as the central element to the task. Most Attribution Extraction approaches are built upon lists of verb cues that are thought to be sufficiently exhaustive and reliable in signalling ARs in a text. The purpose of this project is to test how reliable such lists are once we move away from the news genre they have mostly been applied to. In order to investigate this, I have compared data from a news corpus annotating attribution cues to a small corpus of thread summaries I have compiled for the purpose. The comparison shows not only that cues are highly genre, register and domain specific, but also that attribution cue analysis should not be restricted to verbs. Thus, basing an analysis on pre-established lists of generally valid cues, or even attempting to compile new lists from annotated cues, proves to be a highly impracticable solution.
The challenge set by the new field of Attribution Relation Extraction is being able to connect quotations, opinions and other third party information to its rightful source. The work done so far for detecting Attribution Relations (ARs) has dealt only with written text (news and literary genres). Detecting Attribution in speech would also be crucially important, as ARs can represent a source of confusion for speaker identification. However, the crucial role played by punctuation in written texts is replaced in speech by prosody. In order to be able to automatically detect ARs in speech, we should thus consider both the linguistic and the acoustic levels. By analyzing a corpus of informal telephone conversations, the questions that this study is trying to answer are: On the acoustic level: Are there identifiable prosodic clues of attribution in speech? If yes, which are they and what is their role in marking the presence of reported speech? On the linguistic level: Are there any differences between ARs in written and oral texts? And how do ARs change if we switch to an informal register and the dialogue genre?
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
I argue that sophisticated embodied robots will employ conceptual schemes that are radically different to our own, resulting in what might be described as "alien intelligence". Here I introduce embodied robotics and conceptual relativity, and consider the implications of their combination for the future of artificial intelligence. This argument is intended as a practical demonstration of a broader point: that our interaction with the world is fundamentally mediated by the conceptual frameworks with which we carve it up.
This study examines bilingualism and executive functions from the perspective of dynamical systems, and aims to compare the performance of multilingual children evaluated in 2008 and the same participants four years later, on tasks involving executive function - inhibitory control and attention. When a bilingual change the language in use by another - code-switching, the control required to inhibit the language that is not being used during a specific part of the linguistic interaction, can improve their performance in various tasks requiring executive control. Therefore, the increase in the executive function purchased during the code-switching can also help stimuli control inhibition during nonverbal tasks. The aim of this study is to observe if multilinguals keep the same ability concerning inhibitory control and attention from childhood into teenage years. The participants were 20 multilingual individuals, first tested when they were about 8-10 years old, and then retested four years after. The language spoken by the participants were Pomeranian (L1), German (L1) and BP (L2). To test the executive function Simon task was used as a replication of Bialystok (2003) study. The Simon task involves executive functions, namely inhibitory control and attention. In the task used, stimuli are presented with different target features and in different positions. Participants are instructed to respond only to target features (for example, by pressing the right or the left key of a computer or serial box according to whether the stimulus is a red or a blue square) but to ignore the position of the stimulus on the screen. Accuracy and reaction times were measured, as well as the Simon Effect. The results of reaction time and accuracy in the task suggest that multilinguals keep their abilitiy to perform the task.
The poster presents results from my experiments and corpus analysis concerning to research questions: (1) *What is the nature of the mechanisms used in metaphor comprehension?* (a) specific to language (Relevance Theory, Graded Salience, Grice) or (b) not specific to language (Conceptual Metaphor Theory). Prediction: If the Conceptual Metaphor Theory is correct, cultures in close linguistic contact should have similar inventories of conceptual mappings used in metaphor comprehension and instantiations of them should be intelligible cross-linguistically. (2) The inferential process is contextual through-and-through, but *do bilinguals make use of context in the same way that monolinguals do?* Prediction: If given contextual information is relevant - i.e. it yields cognitive effects / is informative / contributes some new piece of information - then all speakers should make use of it.
Objectives: In recent studies where naive participants were asked to convey information about simple events using gesture and no speech, it was found that participants bypass the rules of their native language when structuring their gesture strings. Consequently, these studies can tell us something about natural dispositions for sequencing information that might have played a role in the emergence of language (Goldin-Meadow et al., 2008). Schouwstra (2012) has shown that the structuring principles that play a role in this process are semantic in nature: semantic organization possibly predated syntactic rules. Moreover, the lab results can be related to the semantic patterns observed in natural communication systems that arise in the absence of linguistic conventions: restricted linguistic systems (RLSs). Examples of such systems are home sign and Basic Variety, the language of unsupervised adult second language learners. My goal is to replicate one of the semantic patterns observed in RLSs in the lab: that of temporal displacement. In existing languages, tense/aspect information is complex, and generally expressed through inflection on the verb. In RLSs, the expression of temporal displacement is relatively simple: the information that an event takes place at some other time than now is communicated by placing a temporal adverbial before an utterance. Methods: In a gesture production task, I asked participants to convey information about events (shown in pictures) that do not take place now. Results: A gesture production study with sixteen Dutch participants revealed that they use the same strategy as that observed in RLSs. Moreover, simple propositional information is never interrupted by temporal information. Conclusions: These results strengthen the conceptual connection between RLSs and the gesture production task, and suggest a semantics-governed picture of the emergence of language, in which complex information was initially conveyed by adding information to the periphery of simple utterances.
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a data driven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.
Almost entirely ignored in the linguistic theorising on names and descriptions is a hybrid form of expression which, like definite descriptions, begin with `the' but which, like proper names, are capitalised and seem to lack descriptive content. These are expressions such as the following, `the Holy Roman Empire', `The Mississippi River', `the Space Needle', etc. These capitalised descriptions are ubiquitous in natural language. But to which syntactic and semantic categories do capitalised descriptions belong? Are they proper names but with vestigial articles? Or are they genuine definite noun phrases but with unique orthography? Or are they something else entirely? This paper addresses this neglected set of questions. The primary goal is to lay the groundwork for a linguistic analysis of capitalised descriptions. Yet, the hope is that clearing the ground on capitalised descriptions may reveal useful insights for the ongoing research into the semantics and syntax of their lower-case or `the'-less relatives. In the end, we are left with a puzzle concerning capitalised descriptions: it seems that neither an assimilation to names nor descriptions is tenable. According to the traditional taxonomy, there is an important linguistic distinction between proper names and definite descriptions, but the analysis of capitalised descriptions suggests that this distinction is a philosophical myth that does not hold to sustained scrutiny.
The familiar pragmatic description of reference in dialogue as a collaborative process invites an important semantic question: are terms of reference that are constructed in a collaborative manner (per Clark & Wilkes-Gibbs, 1986 and similar studies) semantically isolated to the context of their use (the conversation and/or its participants), or are they instead fed directly into a greater domain? If the latter is true, collaborative reference is also a process of far-reaching, dynamic semantic revision; and presently, I offer some early data suggesting terms generated in a collaborative setting may freely and immediately 'intermingle' with terms that were not.
Objectives: In recent studies where naive participants were asked to convey information about simple events using gesture and no speech, it was found that participants bypass the rules of their native language when structuring their gesture strings. Consequently, these studies can tell us something about natural dispositions for sequencing information that might have played a role in the emergence of language (Goldin-Meadow et al., 2008). Schouwstra (2012) has shown that the structuring principles that play a role in this process are semantic in nature: semantic organization possibly predated syntactic rules. Moreover, the lab results can be related to the semantic patterns observed in natural communication systems that arise in the absence of linguistic conventions: restricted linguistic systems (RLSs). Examples of such systems are home sign and Basic Variety, the language of unsupervised adult second language learners. My goal is to replicate one of the semantic patterns observed in RLSs in the lab: that of temporal displacement. In existing languages, tense/aspect information is complex, and generally expressed through inflection on the verb. In RLSs, the expression of temporal displacement is relatively simple: the information that an event takes place at some other time than now is communicated by placing a temporal adverbial before an utterance. Methods: In a gesture production task, I asked participants to convey information about events (shown in pictures) that do not take place now. Results: A gesture production study with sixteen Dutch participants revealed that they use the same strategy as that observed in RLSs. Moreover, simple propositional information is never interrupted by temporal information. Conclusions: These results strengthen the conceptual connection between RLSs and the gesture production task, and suggest a semantics-governed picture of the emergence of language, in which complex information was initially conveyed by adding information to the periphery of simple utterances.
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a data driven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.
Almost entirely ignored in the linguistic theorising on names and descriptions is a hybrid form of expression which, like definite descriptions, begin with `the' but which, like proper names, are capitalised and seem to lack descriptive content. These are expressions such as the following, `the Holy Roman Empire', `The Mississippi River', `the Space Needle', etc. These capitalised descriptions are ubiquitous in natural language. But to which syntactic and semantic categories do capitalised descriptions belong? Are they proper names but with vestigial articles? Or are they genuine definite noun phrases but with unique orthography? Or are they something else entirely? This paper addresses this neglected set of questions. The primary goal is to lay the groundwork for a linguistic analysis of capitalised descriptions. Yet, the hope is that clearing the ground on capitalised descriptions may reveal useful insights for the ongoing research into the semantics and syntax of their lower-case or `the'-less relatives. In the end, we are left with a puzzle concerning capitalised descriptions: it seems that neither an assimilation to names nor descriptions is tenable. According to the traditional taxonomy, there is an important linguistic distinction between proper names and definite descriptions, but the analysis of capitalised descriptions suggests that this distinction is a philosophical myth that does not hold to sustained scrutiny.
The familiar pragmatic description of reference in dialogue as a collaborative process invites an important semantic question: are terms of reference that are constructed in a collaborative manner (per Clark & Wilkes-Gibbs, 1986 and similar studies) semantically isolated to the context of their use (the conversation and/or its participants), or are they instead fed directly into a greater domain? If the latter is true, collaborative reference is also a process of far-reaching, dynamic semantic revision; and presently, I offer some early data suggesting terms generated in a collaborative setting may freely and immediately 'intermingle' with terms that were not.
Objectives: In recent studies where naive participants were asked to convey information about simple events using gesture and no speech, it was found that participants bypass the rules of their native language when structuring their gesture strings. Consequently, these studies can tell us something about natural dispositions for sequencing information that might have played a role in the emergence of language (Goldin-Meadow et al., 2008). Schouwstra (2012) has shown that the structuring principles that play a role in this process are semantic in nature: semantic organization possibly predated syntactic rules. Moreover, the lab results can be related to the semantic patterns observed in natural communication systems that arise in the absence of linguistic conventions: restricted linguistic systems (RLSs). Examples of such systems are home sign and Basic Variety, the language of unsupervised adult second language learners. My goal is to replicate one of the semantic patterns observed in RLSs in the lab: that of temporal displacement. In existing languages, tense/aspect information is complex, and generally expressed through inflection on the verb. In RLSs, the expression of temporal displacement is relatively simple: the information that an event takes place at some other time than now is communicated by placing a temporal adverbial before an utterance. Methods: In a gesture production task, I asked participants to convey information about events (shown in pictures) that do not take place now. Results: A gesture production study with sixteen Dutch participants revealed that they use the same strategy as that observed in RLSs. Moreover, simple propositional information is never interrupted by temporal information. Conclusions: These results strengthen the conceptual connection between RLSs and the gesture production task, and suggest a semantics-governed picture of the emergence of language, in which complex information was initially conveyed by adding information to the periphery of simple utterances.
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
Speaker Diarization involves segmenting audio into speaker homogenous regions and labelling regions from each individual speaker with a single label. Knowing both who spoke and when has many useful applications and can form part of a rich transcription of speech. The task is challenging because it is generally performed without any a priori knowledge about the speakers present or even how many speakers there are. We present a study on the contributions to Diarization Error Rate by the various components of a state-of-the-art speaker diarization system. Following on from an earlier study by Huijbregts and Wooters, we extend into more areas and draw somewhat different conclusions. From a series of experiments combining real, oracle and ideal system components, we are able to conclude that the primary cause of error in diarization is the training of speaker models on impure data, something that is in fact done in every current system. We conclude by suggesting ways to improve future systems, including a focus on training the speaker models from smaller quantities of pure data instead of all the data, as is currently done.
I present two new techniques in the field of corpus analysis. The first technique allows us to compare frequency counts across different corpora and even languages by using an approach similar to z-scores. The second technique allows us to determine for any phenomenon in a corpus how much of its frequency variation is due to true linguistic variation and how much is due to observational deficiencies (measurement accuracy and the Fourier/Heisenberg/Schrödinger/Gabor uncertainty principle).
Information Structure fits syntax like a glove in Old English. There are at least four positions for subjects, five for objects and three for adverbials, which greatly facilitate the information flow from "given" to "new". The change from OV to VO order (ca. 1200) and the loss of a verb-second-like movement rule (15th C) greatly restricted the positions of subjects, objects and adverbials, with information status increasingly aligned with a syntactic function: subjects came to be the default expression of "given" information, and objects of "new" information. Investigating such shifts requires annotating historical texts with information structural categories. As information structure is a relatively new field, pilot schemes to enrich corpora witch labels such as "topic" or "focus" tend to have poor interrelater-agreement. This is why we have opted to add referential information only. NPs are semi-automatically marked up with referential information, with the addition of a label specifying the type of referential link, based on Prince's (1981) givenness categories ("identity", "inferred", "assumed", "inert" and "new"). In my talk I will use this corpus to investigate the hypothesis that the loss of verb-second entailed the loss of clause-initial adverbials as unmarked discourse links.
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
The study of the origins of language and music is an exciting interdisciplinary and multidisciplinary research area. By presenting some of my ongoing research on apes and monkeys, I will suggest how the biology of cognition can contribute to the ongoing debate and give us some insights on how we ended up being chatty, musical hominids.
We investigate the use of cross-lingual acoustic data to initialise deep neural network (DNN) acoustic models by means of unsupervised restricted Boltzmann machine (RBM) pretraining. DNNs for German are pretrained using one or all of German, Portuguese, Spanish and Swedish. The DNNs are used in a tandem configuration, where the network outputs are used as features for a hidden Markov model (HMM) whose emission densities are modeled by Gaussian mixture models (GMMs), as well as in a hybrid configuration, where the network outputs are used as the HMM state likelihoods. The experiments show that unsupervised pretraining is more crucial for the hybrid setups, particularly with limited amounts of transcribed training data. More importantly, unsupervised pretraining is shown to be language-independent. Additionally, we show that finetuning the hidden layers of the DNNs using data from multiple languages improves the recognition accuracy compared to a monolingual DNN-HMM hybrid system.
Relevance Theory (Sperber & Wilson 1986, 1995), Graded Salience (Giora 2003), and Grice (1975, 1978) argue that the mechanisms we use to comprehend metaphors are linguistic in nature. Conceptual Metaphor Theory, on the other hand, claims they are not specialised to language. In this poster I present experimental results comparing monolinguals' and bilinguals' comprehension of metaphors that are either idiomatic in English or German or in both English and German. These results support that the mechanisms are linguistic rather than general.
One of the most contentious debates in studies of dialogue concerns the explanatory role assigned to speakers‘ intentions. To address this issue, this poster reports a computer-mediated variant of the maze task (Pickering & Garrod, 2004), which manipulates the dialogue by inserting artificial clarification requests that appear, to participants, as if they originate from each other. Two kinds of clarification were introduced: (1) Artificial "Why?" questions that query the plan, (2) Fragment clarification requests that query the constituent elements of the referring expressions. As coordination develops, "Why?" clarification requests become progressively easier to respond to, while for fragment clarification requests the converse is the case. We argue that this differential pattern is not arrived at via explicit negotiation of intentions, but via tacit turn-by-turn feedback.
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
Although picture-naming studies have contributed significantly to understandings of speech production, particularly of item-specific effects such as age-of-acquisition, the findings of studies of context-specific effects (i.e., where context is manipulated via the use of distractor items) have been less conclusive. This has led to widespread recognition that the nature of information-flow through the speech production system is vulnerable to subtle variations in task, to the extent that similar paradigms can produce apparently contradictory findings (e.g., Madebach et al., 2011).
Verbal irony is a very complex figure of speech that belongs to the pragmatic level of the language. Until now, however, computational approaches to the detection of irony have only tried to find linguistic clues that could indicate its presence without considering pragmatic factors. In this work, I suggest that an important feature to detect irony in online texts, such as comments of newspaper articles or reviews, is the attribution of the comment to a specific source. I present the design of an experiment aimed at evaluating whether the interpretation of an utterance as ironic or not relies on the expectations that the hearer has about the ironic attitude of the source. In order to do so, I'm going to recreate the context of an online newspaper, with news and comments by different users. The hypothesis at test is whether the same sentence is perceived as more or less ironic depending on whether it is attributed to a commentator who is often ironic vs. a commentator who uses irony more rarely.
Recent research in the philosophy of language has sought to give an account of the nature of assertion. Some define assertion in terms of the norm that governs it, others in terms of its causes and effects, and others in terms of the commitments that go along with making an assertion. All parties assume that there is such a thing as assertion, but this assumption has been challenged by Herman Cappelen (2011). We strengthen Cappelen's challenge by arguing that the category of "assertions" is too broad to admit of analysis and that the theoretical value of the notion of assertion is unclear. Moreover, without a clear conception of what is at stake in the debate over the nature of assertion, criteria of adequacy for theories of assertion are obscure.
In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
While much of research has focused on code switching (CS), interference in bilingual speech has hardly been addressed. Interference is a language phenomenon that occurs as a result of language contact in bilingual speakers. The main aim of the present study is to investigate the interference phenomena. To describe the language processing of interference in bilingual language production, I propose The Lexical Semantic Interference Model(LSIM). The term interference is used to denote different language contact phenomena. This model (LSIM) addresses certain kinds of interference. This study is based on naturalistic data from the speeches of Mazandarani-Persian bilingual speakers. The research sheds light to our understanding of language processing in bilingual speakers.
Text genre classification can enhance Information Retrieval and Natural Language Processing applications. Classifying genres across languages can bring these benefits to the target language without the costs of manual annotation. This poster presents the first approach to this task. It exploits text features which separate genre classes in similar ways across languages, as well as iterative re-labeling in the target language. Experiments show this method to perform equally well or better than full text translation combined with monolingual classification, while requiring fewer resources.
We present preliminary work focusing on the problem of combining social interaction with task-based action in a dynamic, multiagent bartending domain, using an embodied robot. We show how the users' spoken input is interpreted, discuss how social states are inferred from the parsed speech together with low-level information from the vision system, and present a planning approach that models task, dialogue, and social actions in a simple bartending scenario. This approach allows us to build interesting plans, which have been evaluated in a real-world study, using a general purpose, off-the-shelf planner, as an alternative to more mainstream methods of interaction management.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
This is part of ongoing PhD research aiming to quantify how anticipatory pharyngealisation in Arabic, varies as a function of prosodic boundary level (syllable vs. word vs. phrase vs. intonation phrase). Pharyngealisation is manifested in F2 lowering in emphatic compared to plain contexts. F2 was measured at offset, mid and onset points of both vowels in [V2 b V1 # Emphatic trigger] sequences, where the strength of the # was varied syntactically. The duration of the final vowel V1 was also measured to assess how pharyngealisation was affected by temporal distance from the trigger. Six Libyans produced two repetitions of 62 minimal pairs in all boundary conditions. Linear mixed effects results show (1) that pharyngealisation on both vowels across syllable boundary is stable (2) effects of pharyngealisation on the final vowel, i.e. V1 across word and phrase boundaries, and (3) No evidence of pharyngealisation across IP boundary. An examination of V1 + pause durations suggests that the lack of coarticulatory effects on the final vowel, i.e., V1 across IP boundary may be due to the temporal distance from the trigger: all tokens in this condition had a pre-trigger pause. These results are consistent with the view that anticipatory coarticulation is qualitatively different within as compared to across word boundaries. They suggest that pharyngealisation within words may be phonological, whereas across word boundaries it is primarily a phonetic process, conditioned by the temporal proximity of the trigger. Implications for speech production models, speaker variability, and prosodic constituency structure are considered.
People with autism spectrum disorders (ASDs) use less efficient strategies than typically-developing participants on measures of verbal problem-solving such as the Twenty Questions Task (TQT; Minshew et al., 1994). While this can be explained with reference to autism-specific cognitive deficits, the problem-solving of deaf participants suggests a contributory role of atypical language development. Like participants with ASD, deaf participants have been reported to ask over- specific questions in their problem-solving on the TQT, even when they possess good language skills (Marschark & Everhart, 1999). It is thought that this reflects atypical organization of semantic networks (Marschark et al., 2004). However, previous research on this profile has not controlled for verbal and non-verbal IQ differences between deaf and hearing participants, so it is unclear how similar deaf problem-solving is to ASD. Moreover, the link between problem-solving and semantic organization has not been demonstrated empirically. Preliminary results suggest that the problem- solving profile of deaf participants on the TQT is a) less efficient than hearing counterparts and b) very similar to ASD performance. Semantic decision performance in deaf children also indicates links between basic-superordinate category associations and questioning efficiency in problem-solving. Overlaps in deaf and ASD problem-solving are important in understanding the long-term effects of atypical language development on cognitive skills.
The accuracy of speaker diarisation in meetings relies heavily on determining the correct number of speakers. In this paper we present a novel algorithm based on time difference of arrival (TDOA) features that aims to find the correct number of active speakers in a meeting and thus aid the speaker segmentation and clustering process. With our proposed method the microphone array TDOA values and known geometry of the array are used to calculate a speaker matrix from which we determine the correct number of active speakers with the aid of the Bayesian information criterion (BIC). In addition, we analyse several well-known voice activity detection (VAD) algorithms and verified their fitness for meeting recordings. Experiments were performed using the NIST RT06, RT07 and RT09 data sets, and resulted in reduced error rates compared with BIC-based approaches.
Computational simulations of the emergence and evolution of phonological systems have shown that, given sufficient time, organizations of the articulatory space emerge in which phonemes are optimally distinctive (e.g. Steels, 1997; de Boer, 2000; Oudeyer, 2005; de Boer & Zuidema, 2010). However, there has been little investigation into the typological description of articulatory optimization across the world's languages. In this poster I introduce a methodology for measuring the optimization of vowel systems, which proceeds in four steps: first, we measure the formant frequencies of a language's monophthongs; second, we plot the vowels in a perceptual vowel space; third, following Liljencrants and Lindblom (1972), we calculate the potential energy in the system using the inverse-square law from Theoretical Physics; finally, we use Monte Carlo techniques to measure the non-randomness of the system. Using recordings from the UCLA Phonetics Lab Archive (Ladefoged & Blankenship, 2007), this method has been applied to 100 languages. The results suggest that there is a high level of variation in the optimization of vowel systems. I also explore the potential for cross-linguistic correlational studies using this measure, which could reveal whether external social pressures affect the emergent state of vowel systems.
We present preliminary work focusing on the problem of combining social interaction with task-based action in a dynamic, multiagent bartending domain, using an embodied robot. We show how the users' spoken input is interpreted, discuss how social states are inferred from the parsed speech together with low-level information from the vision system, and present a planning approach that models task, dialogue, and social actions in a simple bartending scenario. This approach allows us to build interesting plans, which have been evaluated in a real-world study, using a general purpose, off-the-shelf planner, as an alternative to more mainstream methods of interaction management.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
User simulation is a vital component in training and evaluating statistical dialog systems. However, while the dialog managers they inform have become ever more sophisticated, user simulators have not generated the same level of research or experienced the same advancements: typical simulators require some hand-crafting, they are difficult to evaluate without the managers they are intended to train (leading to circularity of argument) and their influence on the resulting policies of dialog managers remains unexplored. In this paper, we propose a fully generative user simulator that is fully induced (without hand-crafting or goal annotation). The simulator is both stochastic (its behaviour is probabilistic) and consistent (user utterances are generated conditional on a goal, which remains fixed for the dialog). Goals are represented in the model as latent variables, and it incorporates a topic model which clusters together utterances with similar semantics or confusable phonetics, but different surface renderings. Because it is a fully fledged generative model, we are able to evaluate in terms of held-out probability, breaking the circularity common in many previous evaluations of user simulators. Our results demonstrate substantial improvement over a simple fully-generative bigram model, as well as an upper bound on models treating goals as string literals.
BACKGROUND Single cases of aphasia associated with Motor Neuron Disease (MND) have been documented since late 19th Century, but until recently received only little attention, most of it focused on semantics and syntax. In contrast, the disturbances in articulation and phonology have been commonly attributed solely to dysarthria. However, recent evidence of spelling errors (Ichikawa et al. 2010) and apraxia of speech (Duffy et al 2007) suggests that some errors cannot be explained by motor impairment alone, raising the possibility of more central processing deficits. METHODS We examined 20 MND patients with changes in speech and/or language, using a comprehensive assessment of spelling and repetition as well as syntactic, phonological and orthographical awareness. Patients were screened for levels of hearing impairment, dysarthria, ideomotor apraxia and non-verbal cognitive deficits. RESULTS Out of 20 patients with varying severity of dysarthria, 13 showed deficits not confined to dysarthria. Spelling errors suggest impairment at the level of the graphemic buffer, with word length effects and evidence of omission, substitution, transposition and insertion errors in both elicited and spontaneous writing. Moreover, while 6 patients demonstrated impairment in both receptive and expressive modalities, 2 demonstrated dissociation between impaired comprehension of syntax and orthography, and preserved naming and spelling. We conclude that MND is associated with multiple deficits in spoken and written language and discuss our findings in the context of subvocal rehearsal and an interaction between language and motor functions.
In natural conversations, people sometimes complete each other's utterances. How do they manage to do so? One possibility is that they make predictions about their partner's utterances, just as they can make predictions about their own utterances (Pickering & Garrod, 2009; Gambi & Pickering, 2011). We tested whether participants predict the complexity of their partners' utterances in a joint picture description task. In a pretest, we asked participants to describe pictures with either a short (e.g.,"the soldier follows the swimmer") or a long (e.g.,"the soldier follows the swimmer with the vase and the cane") sentence. Participants took longer to produce the beginning of the sentence (i.e., "the soldier follows") when it was followed by a long ending compared to a short ending (mean difference=0.218 ms, pMCMC<0.001). We interpret this as evidence that people build predictions concerning what they are going to say next. We expect that similar predictions are built for the upcoming utterances of other speakers. We tested three conditions. After a participant produced the beginning of the description (e.g., "the soldier follows"), either the same participant (SELF), her partner (OTHER), or nobody (NO) had to complete the description with a long or short ending. We expected to find complexity effects in the SELF and the OTHER condition, whereas we did not expect to find any effects in the NO condition (where there is no upcoming utterance). The duration of the sentence beginning was longer before long than before short endings in SELF (p < 0.001) and OTHER (p=0.03), but not in the NO condition. However, we did not find a significant length x condition interaction between OTHER and NO. Therefore, we cannot definitely conclude that the effect in OTHER is due to representing the other's utterance.
Questions about whether the police acted politely can have a bearing on whether specific police powers are deemed to have been used legitimately (Nadler and Trout, forthcoming); whether or not the police are perceived as treating people with politeness and respect is a core component in overall public confidence in the police (Bradford and Jackson, 2009). However, the question of what actually constitutes politeness in a police context has been rarely discussed. The position of the police institution is more complex than it may at first appear - although generally perceived as a powerful institution authorised by the state to use force, it may also be positioned as an institution providing public service. The challenges for the police in negotiating these positions of authority and service in relation to politeness can be seen in police constructions of apologies. This poster looks at data from the Scottish police, in the form of responses to complaints from members of the public, focusing on how apologies are constructed. My analysis suggests two different types of apologies - one close to a traditional act of an apology 'for' an offence, and a second type, seeking to negotiate around public expectations of service from the police institution. The construction of this second act may be part of the Scottish police trying to develop a type of apology act that responds to both their own and public perception of what their police service should be. References: Bradford, Ben and Jonathan Jackson (2009) "Public Trust in Criminal Justice: A Review of the Research Literature in the United States." Nadler, Janice and J. D. Trout. (forthcoming). "The Language of Consent in Police Encounters." In Oxford Handbook on Linguistics and Law, eds. L. Solan and P. Tiersma: Oxford University Press.
A key problem for models of dialogue is to explain how conventions are established and sustained. Existing accounts emphasize the importance of interaction, demonstrating how collaborative feedback leads to representations that are more concise (Krauss and Weinheimer 1967; Clark 1996), abstract (Schwartz 1995), systematized (Healey 1997; Mills and Healey 2006), stable and arbitrary (Garrod et al 2007). Despite these studies' very different approaches, a common methodological choice is their study of how interlocutors co-ordinate on the content of referring expressions. However, co-ordination in dialogue also requires procedural co-ordination (Schegloff 2007). To investigate procedural co-ordination we report a collaborative task which presents participants with the recurrent co-ordination problem of ordering their actions and utterances into a single coherent sequence: Pairs of participants communicate via a text-based chat-tool (Healey and Mills 2006). Each participant's computer also displays a task window containing a list of randomly generated words. Solving the task requires participants to combine their lists of words into a single alphabetically ordered list. To select a word, participants type the word preceded with "/". To ensure collaboration, participants can only select words displayed on the other participant's screen and vice versa. Note that this task is trivial for an individual participant. However, for pairs of participants, this task presents the coordination problem of interleaving their selections correctly: participants cannot select each other's words, words can't be selected twice, and the words need to be selected in the correct order. (See Mills 2011 for more detailed description). Despite the task only permitting a single logical solution (and being referentially transparent - the words are the referents), participants develop group-specific routines for co-ordinating their turns into a coherent sequence. Importantly, we show how this development does not occur through explicit negotiation: in the initial trials, participants' attempts to explicitly negotiate these routines more often than not prove unsuccessful (cf. Pickering and Garrod 2004, who observed similar patterns in a series of maze game experiments). Instead, we demonstrate how these routines emerge via tacit negotiation as a conseqence of interlocutors' collaborative attempts to deal with miscommunication. Drawing on how interlocutors engage in resolving these misunderstandings in the test phase, we argue that these collaborative routines operate normatively, having become conventionalized by the interlocutors. References Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Garrod, S., Fay, N., Lee, J., Oberlander, J., & MacLeod, T. (2007). Foundations of Representation: Where Might Graphical Symbol Systems Come From? Cognitive Science 31(6), 961-987. Healey, P.G.T. (1997). Expertise or expert-ese: The emergence of task-oriented sub-languages. In Proceedings of the 19th Annual Conference of The Cognitive Science Society. Stanford University, Healey, P. G. T. & Mills, G. (2006). Participation, precedence and co-ordination. In Proceedings of CogSci Krauss, R. M. and Weinheimer, S. (1966). Concurrent feedback, confirmation and the encoding of referents in verbal communication. Journal of Personality and Social Psychology, 4 (3), 343-346. Mills, G. J. (2011). The emergence of procedural conventions in dialogue. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Boston. USA Pickering, M. J. and Garrod, S. (2004). Towards a mechanistic psychology of dialogue. Behavioural and Brain Sciences, 27(2), 169-190. Schegloff, E. A. (2007). Sequence Organization in Interaction: Vol 1. Cambridge University Press
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
"Efforts to extract attribution relations have multiplied in recent years, due to their relevance in particular for Opinion Analysis and Information Extraction applications. Being able to correctly identify the source (either a specific entity e.g. President Obama or a class thereof e.g. experts, official sources, rumours) of a piece of information or an opinion would be extremely beneficial. This would in fact enhance opinion-oriented applications of Language Technology and revolutionise the way we can select information, e.g. on the basis of source expertise and reliability. However, current approaches to the automatic extraction of attribution relations remain limited in scope and precision and are therefore not adequate to support the development of reliable applications. Moreover, there has been little or no attempt to identify relevant features of attribution (e.g. different type of sources, authorial stance) that affect the perception and interpretation of the attributed material. This study addresses several of the attribution strategies identified in Italian and English news corpora, employed to build a broad-coverage annotated resource, in order to develop a more comprehensive system for the extraction of attributions and their relevant features from news texts."
The current study presents a novel experiment which aims to bridge the gap between theoretical approaches and observed trends in language typology and evolution. Lupyan & Dale (2010) found that the bigger the population using a language, the more that language will encode functional items using lexical strategies. These correlations are hypothesised to be the result of larger language populations having more adult second language learners with different learning biases from first language learners, which may include preferring lexical over morphological strategies (Lupyan & Dale, 2010). Experimental work on the differences between adult and child learning however, has shown contradictory results (Hudson Kam & Newport, 2005,Hudson Kam & Newport, 2009). The current study seeks to demonstrate that foreigner-directed speech should be considered when explaining the typological correlations discussed above. The experiment investigated whether interacting with a perceived foreigner would influence an interlocutor to adopt lexical over morphological strategies. Participants were trained on an artificial language. The language offered two ways of describing the scenes used in the experiment, either using a lexical and a morphological strategy. Participants were in one of two conditions, either the esoteric or exoteric condition, where they perceived their interlocutor as either an insider or outsider respectively. The frequency of lexical or morphological strategies used in a communication task was recorded. The results show that lexical strategies are adopted more by participants in the exoteric condition, but only if the first speaker in an interaction initially uses a lexical strategy. It is concluded that foreigner directed speech should be considered as a factor in the cultural evolution of language when seeking to explain trends in language typology. References Hudson Kam, C., & Newport, E. L. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1, 151?195. Hudson Kam, C., & Newport, E. L. (2009). Getting it right by getting it wrong: When learners change languages. Cognitive Psychology. Lupan, G. & Dale, R (2010). Language structure is partly determined by social structure. PLoS ONE 5(1): e8559
Machine Translation is a well-established field, yet the majority of current systems translate sentences in isolation, losing valuable contextual information from previously translated sentences in the discourse. One important type of contextual information concerns who or what it is that a coreferring pronoun corefers to (i.e., its antecedent). Languages differ significantly in how they achieve coreference, and awareness of antecedents is important in making the right choice. Disregarding a pronoun's antecedent in translation can lead to inappropriate coreferring forms in the target text, degrading a reader's ability to understand it. This work focusses on the translation of coreferring pronouns in English-Czech Statistical Machine Translation (SMT). I present an assessment of the effectiveness of source-language annotation for this purpose and highlight limitations with respect to currently available evaluation methods and resources.
The treebank is a new resource for researchers working on the intersection between vision and language. It is intended to be a freely-available corpus of images and corresponding text for the development and evaluation of natural language generation, image annotation, and structure induction. It differs from existing datasets because it contains syntactic representations of the data, which makes it applicable to a wider range of tasks. The images are provided in their surface form, as a set of gold-standard object annotations, and as gold-standard visual dependency graphs derived from the annotations. The annotations are made {it with respect to} the corresponding text, which means they cover a wide range of object classes and are directly related to the image description. The visual dependency graphs are generated using a geometric dependency grammar, which defines how relations between pairs of objects can be generated. The text is provided in its surface form and as a syntactic dependency tree, which is produced by a state-of-the-art parser. The treebank currently contains several hundred completely annotated pairs of data.
It is now widely accepted that language comprehension involves prediction. Upon hearing eat in the sentence "the boy will eat the cake", listeners are more likely to look toward an edible object than upon hearing a verb that does not impose this semantic restriction upon its theme, such as move (Altmann & Kamide, 1999). Using the visual world paradigm, we investigated the ability of listeners to predict phonological features of themes and to subsequently combine these with the predictions made from the semantic restrictions of verbs. Participants were faster to initiate saccades towards the target when sentences contained a restrictive verb, and independently they looked toward the target quicker when the sentence contained a phonologically restrictive determiner (a) than when it did not (his). Our findings demonstrate that listeners' predictions can be driven from integrated information from multiple linguistic domains.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
There is little research on language involvement in MS, though the studies that do exist indicate a wide variety of language impairments. No comprehensive study has yet been done investigating MS abilities in all the major language domains. This study used one receptive and one expressive test for each of syntax, semantics, phonology and written language to assess patterns in MS language skills. The results of the group analysis indicate that MS patients are significantly impaired in receptive syntax, word-finding, reading, non-word repetition and spelling. The patients were then divided according to MS subtype. Analysis by subtype revealed that secondary progressive patients are linguistically preserved compared to primary progressive and relapsing-remitting groups on reading, and compared to the primary progressive group on word-finding. An individual analysis showed that subsets of individuals are significantly impaired on most of the language tests used, but there are different individuals in each subset. Syntax was impaired in exactly half of each MS subtype, and double dissociations were seen between syntax and semantics.
The status of universals in the study of language is under increasing scrutiny (Evans & Levinson, 2009).The materialist position on universals contrasts abstract and concrete universals.The former is the universal conventionally recognized by psychologists and linguists; it has an important but limited role to play.The latter is a real entity that is a universal by virtue of its pervasive influence in the domain.The schwa sound is suggested as a viable concrete universal in the study of language use.
We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
Recent evidence suggests that people who stutter (PWS) may display different eye movement behaviour in silent reading compared with people who do not stutter (Corcoran & Frisson, 2011). We used the moving window paradigm (McConkie & Rayner, 1975) to examine whether the size of the perceptual span (the range of effective vision used in reading) differed between people who stutter and age- and education- matched controls. Participants read sentences in which information was available from (a) the currently fixated word only, (b) the currently fixated word plus one word to the right, (c) the currently fixated word plus two words to the right, or (d) the whole sentence. Results showed that people who stutter had a smaller rightward perceptual span, compared with controls. People who stutter also showed longer reading times in all conditions, more fixations, and more regressive saccades than controls.
Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosodic structure or syntactic probability, which NLP systems typically ignore. To evaluate the utility of word duration information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and either ToBI annotation or raw word duration measures. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms baselines that exploit either lexical, acoustic, or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants. Additionally, our results suggest that predictability effects are more useful than prosodic constituency for bootstrapping basic syntax.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
Autism Spectrum Disorders (ASDs) is a term coined to cover a group of neurodevelopmental disorders associated with noticeable impairments in three domains: social interaction (inability to develop age-appropriate relationships), communication (difficulties with language) and imagination ("restricted repetitive and stereotyped patterns of behaviors, interests and activities" [1]). There is no cure for ASDs, but there is strong evidence that early interventions can help children with ASD to become more independent and to acquire social and communication skills. Experts report that interventions using social stories in relation to social communication skills are effective for the treatment of children with ASD [2]. Research shows also that humour might be a very helpful tool in educational interventions for children with ASD [3]. This project aims to explore the social communication skills (specifically sharing attention, sharing emotion, reciprocity) in children with ASD by combining social stories and humour.
Laryngeal air sacs are a product of convergent evolution in many different species of primates, cervids, bats, and other mammals. In the case of Homo sapiens, their presence has been lost. This has been argued to have happened before Homo heidelbergensis, due to a loss of the bulla in the hyoid bone from Austrolopithecus afarensis (Martinez, 2008), at a range of 500kya to 3.3mya. (de Boer, to appear). Justifications for the loss of laryngeal air sacs include infection, the ability to modify breathing patterns and reduce need for an anti-hyperventilating device (Hewitt et al, 2002), and the selection against air sacs as they are disadvantageous for subtle, timed, and distinct sounds. (de Boer, to appear). Further, it has been suggested that the loss goes against the significant correlation of air sac retention to evolutionary growth in body mass (Hewitt et al., 2002). I argue that the loss of air sacs may have occurred more recently (less than 500kya), as the loss of the bulla in the hyoid does not exclude the possibility of airs sacs, as in cervids, where laryngeal air sacs can project between two muscles (Frey et al., 2007). Further, the weight measurements of living species as a justification for the loss of air sacs despite a gain in body mass I argue to be unfounded given archaeological evidence, which suggests that the laryngeal air sacs may have been lost only after size reduction in Homo sapiens from Homo heidelbergensis. Finally, I suggest two further justifications for loss of the laryngeal air sacs in homo sapiens. First, the linguistic niche of hunting in the environment in which early hominins hunters have been posited to exist - the savannah - would have been better suited to higher frequency, directional calls as opposed to lower frequency, multidirectional calls. The loss of air sacs would have then been directly advantageous, as lower frequencies produced by air sac vocalisations over bare ground have been shown to favour multidirectional over targeted utterances (Frey and Gebler, 2003). Secondly, the reuse of air stored in air sacs could have possibly been disadvantageous toward sustained, regular heavy breathing, as would occur again in a hunting environment.
Hidden authorship refers to those scenarios in which an informative act is produced, but in which the communicative intent behind it is hidden. Suppose, for example, that a dinner guest wishes for some more wine, but recognises that, for whatever reason, it would be somewhat impolite to ask for this directly. Instead, she places her empty glass in a conspicuous location where it is likely to be noticed by the host, but does not explicitly bring attention to the fact that the glass is empty. Hidden authorship is categorically different to other varieties of intentional communication; see the table, right.
Building data sets is problematic in Natural Language Processing (NLP) because it is costly and time consuming to develop them by hand. This motivates the use of existing data, but these data are often in the wrong form. Although widely used in other areas of computer science, linear algebra is rarely applied to the data creation aspect of NLP and speech synthesis problems. We present two examples of using matrix representations to facilitate the building of data sets from existing resources. The first demonstrates the smaller case of merging data sets for use in the evaluation of synthetic speech systems. The second, larger and more abstract example, uses a similar, but inverted, version of this method to gather training data from existing, web-based resources to build exams. This matrix-based approach increases access to, and the utility of, such data and better directs the development of good training sets.
We present a novel probabilistic classifier, which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength, which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efficiently estimated. We show that a generative language model based on our classifier consistently matches modified Kneser-Ney smoothing and can outperform it if sufficiently rich features are incorporated.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser's search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.
The Gettier problem in epistemology is based our reluctance to attribute knowledge to people in certain kinds of situations ("Gettier cases"), despite their believing the truth with justification. This has led most people to conclude that the "tripartite analysis" of knowledge (on which knowledge = justified, true belief) is false. I defend the tripartite analysis by arguing that our reluctance to attribute knowledge in Gettier cases is explained by the fact that to do so would generate misleading conversational implicatures.
This paper presents an experiment on the role of working memory capacity (WMC) among Japanese-English bilinguals when performing a bilingual dichotic listening (BDL) task. Previous studies have demonstrated a significant role for WMC in maintaining attention to the relevant stimuli and inhibiting irrelevant stimuli in monolingual dichotic listening. In the BDL task, in each ear, bilingual participants heard different concurrent texts which were semantically re/unrelated, and in the same language or in the two different languages. They also completed a test of WMC. The results showed that domain-general WMC predicted the ability to suppress the unattended language, regardless of its semantic relatedness to the language in the attended channel, especially when the attended language was English and the texts were read in different languages. This would demonstrate executive control of WMC in speech comprehension when bilinguals are required to maintain attention to one language and inhibit another.
Twitter is a very popular way for people to share information on a bewildering multitude of topics. Tweets are propagated using a variety of channels: by following users or lists, by searching, or by retweeting. Of these vectors, retweeting is arguably the most effective, as it potentially can reach the most people, given its viral nature. A key task is predicting if a tweet will be retweeted, and solving this problem furthers our understanding of message propagation within large user communities. A human experiment on the task of deciding whether a tweet will be retweeted shows that the task is possible, as human performance levels are much above chance. We present a machine learning approach, based on the passive-aggressive algorithm, that is able to predict retweets as well as humans. Analyzing the learned model, we find that performance is dominated by social features, but that tweet features add a substantial boost.
In this paper I argue that English has a third type of if, that is, a declarative subordinator that introduces irrealis content clauses, as in (1): (1) I'd prefer if you stayed inside. I present evidence that strongly suggests that irrealis if-clauses are not ordinary conditional adjuncts, but function like VP-internal complements or subjects. In syntactic tests like extraction, preposing, clefting, and constituent order, irrealis clauses behave predominantly like complements, not adjuncts. Moreover, no other preposition with conditional or concessive meaning can be used to replace irrealis if. A close analysis of the semantics of irrealis clauses also points towards a non-conditional interpretation, as irrealis clauses refer to hypothetical states of affairs, but no idea of condition is implied in their meaning. Finally, I also discuss another candidate for the subordinator class, the wh-word when, which can often replace irrealis if (albeit without hypothetical meaning), as in (2): (2) I hate when this happens.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.
English vowel production is undergoing a shift in California (Eckert 2008), and one of the most robust aspects of the shift is the merger between the vowels in LOT and THOUGHT (DeCamp 1953). While this 'low-back merger' is quite advanced for much of the Western United States, the distinction is still maintained in San Francisco, California (Labov, Ash, & Boberg 2006). The present paper analyzes the LOT/THOUGHT merger in interview data from a stratified sample of 29 speakers from one San Francisco neighborhood. The results show that the population as a whole is moving in apparent time toward complete merger, while ethnographic analysis further suggests that the surprising, continued maintenance of the distinction among some individuals can be understood with respect to shifting language ideologies and recent social change. Specifically, I argue that the maintenance of the low back vowel distinction, and specifically a raised production of the THOUGHT vowel, has become reimagined as a resource for constructing a more traditional neighborhood identity, while production of the merger, or a lowered production of THOUGHT, aligns a speaker with an emerging linguistic identity that is aligned with broader regional norms. Sociolinguistic research has demonstrated a renewed interest in the meanings of linguistic variables as used in day-to-day life (Eckert 2008b), and I argue here that these meanings can provide insight on the progression of sound change in progress.
This paper presents a syntax-based framework for gap resolution in analytic languages. CCG, reputable for dealing with deletion under coordination, is extended with a memory mechanism similar to the slot-and-filler mechanism, resulting in a wider coverage of syntactic gaps patterns. Though our grammar formalism is more expressive than the canonical CCG, its generative power is bounded by Partially Linear Indexed Grammar. Despite the spurious ambiguity originated from the memory mechanism, we also show that its probabilistic parsing is feasible by using the dual decomposition algorithm.
Bilinguals have shown their cognitive control advantages in suppressing irrelevant information when they are given both visual and auditory stimuli (e.g., Bialystok, 2009; Green, 1998; Green & Bavelier, 2003; Rogers, Lister, Febo, Besting, & Abrams, 2006) and they are seen in tasks calling for inhibition of task-irrelevant cues (Bialystok, 2001). As for auditory attentional control in bilinguals, it has been investigated with phonologically relevant, but semantically meaningless consonant-vowel syllables in the dichotic listening paradigm (e.g., Hugdahl, Westerhausen, Alho, Medvedev, Laine, & Hamalainen, 2009; Soveri, Laine, Hamalainen, & Hugdahl, 2010), which does not necessarily give an implication for language inhibition among bilinguals when they process meaningful spoken messages. Cherry (1953, 1954) and Broadbent (1958) found in the dichotic listening task that the more physically different (e.g., speaker gender; voice intensity; speaker location) the unattended message is from the attended one, the easier it is to maintain attention to the attended channel. As for bilinguals, it has not been investigated how they sustain attention to the relevant auditory information (e.g., L1) while inhibiting the irrelevant one (e.g., L2), in a real-life situation where it is rather common that they interact with each other in a bilingual language mode in that they are communicating with (or listening to) bilinguals who share their two (or more) languages and language mixing may take place (Grosjean, 2008, p. 251). We gave the participants more direct and abrupt interference, i.e., the bilingual dichotic listening task (Miura, Pickering, Logie, & Sorace, 2010), to demonstrate more straightforward evidence of language inhibition and suppression among bilinguals and found that bilingual listeners are able to inhibit unattended language regardless of its semantic relatedness, particularly when the messages are spoken in different languages. Furthermore, it appears that bilinguals can notice whether what they hear in the unattended channel is semantically related to what they hear in the attended channel when they hear the same language in both channels. Our future research will investigate how bilingual listeners filter out auditory information from the unattended channel.
The Naming Game looks at how agents in a population converge on a shared system for referring to continuous stimuli (Steels, 2005; Nowak & Krakauer, 1999). These models assume that a mutual exclusivity bias is necessary for establishing a shared lexicon. However, I show that communicative success is still achieved without this bias. Monolingual assumptions may obscure differences in the evolutionary dynamics of languages in monolingual and bilingual societies.
In this paper I argue that English has a third type of if, that is, a declarative subordinator that introduces irrealis content clauses, as in (1): (1) I'd prefer if you stayed inside. I present evidence that strongly suggests that irrealis if-clauses are not ordinary conditional adjuncts, but function like VP-internal complements or subjects. In syntactic tests like extraction, preposing, clefting, and constituent order, irrealis clauses behave predominantly like complements, not adjuncts. Moreover, no other preposition with conditional or concessive meaning can be used to replace irrealis if. A close analysis of the semantics of irrealis clauses also points towards a non-conditional interpretation, as irrealis clauses refer to hypothetical states of affairs, but no idea of condition is implied in their meaning. Finally, I also discuss another candidate for the subordinator class, the wh-word when, which can often replace irrealis if (albeit without hypothetical meaning), as in (2): (2) I hate when this happens.
The International English Language Testing System (IELTS) English proficiency test is widely taken by non-native speakers of English for reasons like higher education, immigration and registration with professional bodies in western countries. In response to the growing demand for the IELTS exam, a wide range of private institutes have opened in Pakistan for the preparation of IELTS candidates, catering to different income brackets. The private institutes offer short General English courses and preparation for international proficiency tests like IELTS. The present impact study of IELTS in Pakistan (following global impact studies of IELTS such as Hawkey 2006) attempts to explore the effectiveness of these IELTS preparation courses in improving the proficiency level of their clients. The study examines the IELTS preparation course at two institutes in Pakistan, BERLITZ - an international institute and PACC - a locally owned institute. A pre- and post-test for IELTS was taken by 20 students enrolled in both institutes. The test data was supplemented by class observation, questionnaires and informal interviews with teachers and students. The study found that unsurprisingly the two cohorts at the two institutes differ clearly at the starting level as found by the pre-test (Students at BERLITZ are more proficient than PACC students). Yet the progress for these two cohorts suggests that there is not much difference in the improvement level between BERLITZ and PACC students, despite superior teaching and resources at BERLITZ. The results can be explained to an extent by the brevity of the courses (both 8 weeks) but they also point to distortions in the market for IELTS preparation in Pakistan where IELTS has assumed a significance that goes beyond the opportunity that a relatively small number of Pakistanis have to study and/or work abroad.

Committee

School of Informatics Joachim Fainberg
School of Philosophy, Psychology and Language Andres Karjus
School of Philosophy, Psychology and Language Madeleine Long
School of Philosophy, Psychology and Language Candice Mathers
School of Informatics Joana Ribeiro
School of Philosophy, Psychology and Language Eva-Maria Schnelten

Previous Committees

Supported by