Complex Predicates and Multidimensionality in Grammar

This paper contributes to the on-going discussion of how best to analyze and handle complex predicate formations, commenting in particular on the properties of Hindi N-V complex predicates as set out by Vaidya et al. (2019). I highlight features of existing LFG analyses and focus in particular on the modular architecture of LFG, its attendant multidimensional lexicon and the analytic consequences which follow from this. I point out where the previously existing LFG proposals have been misunderstood as viewed from the lens of theories such as LTAG and HPSG, which assume a very different architectural set-up and provide a comparative discussion of the issues.


Introduction
Lexical-Functional Grammar (LFG) is a theory that posits several different types of linguistic representations, each with their own logical structure and vocabulary. This architectural choice reflects the realization that different parts of the grammar are of different mathematical complexity and encode different types of information. Within syntax, LFG posits two different types of structure: the c(onstituent)-structure encodes information about constituency, linear order and the hierarchical organization of constituents. The f(unctional)-structure, on the other hand, encodes dependency type of information, for example of a predicate and its arguments, specifiers and modifiers (Bresnan, 2001, Dalrymple, 2001, Butt and King, 2015. The syntax interacts with further components of grammar, such as the a(rgument)-structure, which encodes predicate-argument relations at the lexical semantic level, or the s(emantic)-structure, which encompasses clausal semantics and must, of course, also itself interact with a-structure. These interactions are accomplished via LFG's mathematically well-defined projection architecture.
The multidimensional architecture of LFG predisposes it to be prepared for mismatches across linguistic representations. Complex predicates have been demonstrated to involve a challenging mismatch between a(rgument)-structure and syntax and it is thus perhaps no accident that very detailed formal treatments of complex predicates across a range of languages have been formulated especially within LFG (Alsina et al., 1997, Wilson, 1999, Butt, 2010. These formal treatments also included a clear differentiation of complex predicates from auxiliation and control constructures (Seiss, 2009).
The LFG analysis of complex predicates crucially includes a notion of argument structure composition, by which a complex, bi-or multiclausal a-structure corresponds to a monoclausal f-structure. Alsina (1996), Mohanan (1994) and Butt (1995) demonstrated that this argument structure composition must be mediated by the syntax as the parts of a complex predicate can be distributed across the clause and do not necessarily have to be adjacent to one another. This stands in contrast to the initial formulation of the relationship between a-structure and f-structure, which saw the mapping from thematic roles to grammatical functions as being accomplished only within the lexicon. Once it was established the the mapping between a-structure and f-structure needed to be mediated via the syntax, the theory was extended to allow for this. A crucial but sinple component of the extension was the idea that a-structures can contain variables which are then instantiated by other a-structures in the clause. The composed a-structure is placed into correspondence with grammatical functions at f-structure as a whole.
This type of dynamic argument structure composition is difficult to accomplish for frameworks that depend heavily on lexical encoding, like LTAG (Lexicalized Feature-based Tree-Adjoining Grammar), HPSG (Head-Driven Phrase Structure Grammar) or CCG (Combinatory Categorial Grammar). The essential problem is that they do not have recourse (or have only limited recourse) to structures which can introduce information that is not lexically rooted. A recent contribution by Vaidya et al. (2019) proposes to meet this challenge for LTAG. They do so with respect to the concrete case of Hindi Noun-Verb complex predicates and draw heavily on previous work within LFG and by Mohanan (1994). In the process, they discuss features of the existing LFG analyses and compare and contrast the approaches with LTAG. This paper picks up on some of their discussions and observations from an LFG perspective. In what follows, I first comment on the properties of Hindi N-V complex predicates as set out by Vaidya et al. (2019) and provide different perspective on the data (section 2). I then highlight features of existing LFG analyses, point out where the previously existing proposals have been misunderstood by Vaidya et al. (2019) and provide a comparative discussion of the issues (section 3). Section 4 concludes the paper.

Properties of Complex Predicates
There are two related properties of complex predicates I would like to draw attention to. One concerns the nature of the relationship between the predicational parts of a complex predicate, the other the productive nature of complex predicates. Both points are discussed in Vaidya et al. (2019); however, I believe their significance has not been fully appreciated. Vaidya et al. (2019) base their analyses on previous data adduced on Hindi and Urdu N-V complex predicates and on their own experiences with constructing the Hindi-Urdu Treebank (HUTB; Bhat et al. (2017)) and Hindi PropBank entries (Vaidya et al., 2013). One difficulty with N-V strings is the determination of when a genuine instance of complex predication occurs. Examples as in (1) are clear N-V complex predicates because the direct argument kahani 'story' cannot be licensed by any of the verbs, but is licensed by the noun yad 'memory'. On the other hand, it is also clear that the shape of the clause's subject is determined by the light verb. When it is an agentive light verb like 'do', the subject is ergative, with the non-agentive light verbs 'come, be, become', the subject is dative.

Combined Predication
(1) a. nadya=ne kahani yad k-i Nadya.F.Sg=Erg story.F.Sg.Nom memory do-Perf.F.Sg 'Nadya remembered a/the story.' (lit.: 'Nadya did memory of the story.') b. nadya=ko kahani yad a-yi Nadya.F.Sg=Dat story.F.Sg.Nom memory come-Perf.F.Sg 'Nadya remembered a/the story.' (lit.: 'Memory of the story came to Nadya.') c. nadya=ko kahani yad hE Nadya.F.Sg=Dat story.F.Sg.Nom memory be.Pres.3.Sg 'Nadya remembers/knows a/the story.' (lit.: 'Memory of the story is at Nadya.') d. nadya=ko kahani yad hu-i Nadya.F.Sg=Dat story.F.Sg.Nom memory be.Part-Perf.F.Sg 'Nadya came to remember a/the story.' (lit.: 'Memory of the story became to be at Nadya.') Tests for monoclausality vs. biclausality (Mohanan, 1994) show that these examples are all syntactically monoclausal. However, the overall argument structure is contributed by two different items, namely yad 'memory' and kar 'do'. The noun is the main predicational element of the clause, while the finite verb is commonly referred to as a light verb. Sample analyses within LFG are shown in (2) for (1a) and in (3) for (1b). The light verb is taken to have an "incomplete" a-structure with a variable as its second argument: kar< agent %Pred >. This variable %Pred is instantiated by the a-structure of the noun, as shown below.
As discussed by Vaidya et al. (2019), Hindi N-V complex predicates exhibit an additional wrinkle in that some nouns do not agree with the light verb and in this case they also do not function as an argument of the clause. yad 'memory' is an instance of this type of nominal.
(2) Agentive N-V Complex Predicate do < agent memory< experiencer theme >> (a-structure) 2 This example of a Hindi N-V complex predicate establishes the following: a) there is a mismatch between a-structure and f-structure in that we have a biclausal a-structure which corresponds to a monoclausal f-structure; b) both parts of the complex predicate contribute to the overall argument structure of the clause; c) the choice of light verb governs the case marking on the subject. The latter point is true not only for N-V complex predicates, but for all other types of complex predicates in Urdu/Hindi (see Butt (1995) on V-V complex predicates).
In the existing LFG analyses this is accounted for by investing the light verb with an argument structure of its own, which crucially includes the argument corresponding to the clausal subject. In contrast, the analysis provided by Vaidya et al. (2019) encodes all of the relevant argument structure information exclusively as part of the lexical information contributed by the noun. They do this by positing two different elementary trees for each noun of the language. These elementary trees serve as the main predicational element of the clause and are designed to be compatible with different scenarios. One of the elementary trees is designed for a situation in which the noun combines with an agentive light verb like kar 'do' above. The other elementary tree is for a combination with non-agentive light verbs. In essence, the noun thus "anticipates" the fact that it can be combined with with various types of light verbs and offers up a list of lexicalized alternatives to effect the right overall argument structure. Vaidya et al. (2019) see no problem in localizing information coming from the light verb on the noun instead -they see this as a situationt that parallels the treatment of passives. In passivization, the verb must anticipate that it might be passivized and thus there must be two alternative elementary trees for all of the passivizable verbs.

Complex Predicate Stacking
Heavy use of the lexicon in order to encode combinatorial possibilities of the syntax can be seen as a matter of aesthetics. However, the fact that complex predicates can stack productively makes this design decision harder to defend. This point was made early on by Butt (1994), who showed that various types of complex predicates can be combined with one another productively. In (4), for example, the main verb banaa 'make' combines with an aspectual light verb le 'take'. This complex predicate is further combined with the permissive light verb de 'let'.
di-ya] make take-Inf.Obl give-Perf.M.Sg 'Anjum let Saddaf make (build) a house (complete building).' The point made by Butt (1994) was that if one localizes all of the relevant information about the combinatory effects of the complex predication on the main verb alone, then one needs to lexically "anticipate" all manners of further combinations as well. That is, the verb bAna would have to be invested with information as to what happens when it combines first with an aspectual light verb and then with a permissive, in addition to the possibilities of just combining with an aspectual light verb or a permissive.
The same point holds for N-V complex predicates, as the example in (5) from Butt et al. (2008) illustrates. Here the noun pinch (borrowed from English) is combined with a causativized version of the light verb 'do'. This combination is then further combined with the aspectual light verb and the permissive we saw in (4).  (5), several more elementary trees would have to be added to the lexicon so as to be able to anticipate this and other combinations. Given that causativization patterns in Urdu/Hindi are quite complex and that there are several classes of aspectual light verbs, the lexicon would appear to be set to explode.

Accounting for Combinatory Possibilities
The obvious alternative from the LFG perspective is to determine exactly which parts of the predication are responsible for which parts of the overall structure and to encode that information in the appropriate parts of the lexicon and grammar. In LFG the argument structure information is distributed across the various lexical items, with each part of the complex predicate contributing to the overall argument structure as shown in (2) and (3). This stands in stark contrast to the LTAG analysis where the light verb is invested with no argument structure (everything is encoded by the noun) or the HPSG analysis provided by Müller (2019), where the light verb is seen to "execute" the argument structure of the noun and thus contains the entire argument structure of the clause. 1 In LFG, the combinatory possibilities, restrictions and effects of complex predication are controlled by the syntactic rules. With respect to this, some discussion is in order. Vaidya et al. (2019), citing Lowe (2015), note that the exact mechanism for argument combination is not formally explicit within LFG. However, this is not quite right. Lowe actually objects to the fact that the theory was extended to allow for the introduction of variables into a-structure. His approach within LFG is a different one -he instead imports the power of linear logic and compositional semantics into the analysis. He thus brings in clausal compositional semantics to regulate a matter that is actually located at the interface between lexical semantics and (morpho)syntax. Besides not following the overall LFG spirit of locating information within the appropriate modules of grammar, he also gives up on constraining the combinatorial possibilities of complex predicates and further fails to engineer the semantic representation in such as way so as to be compatible with event semantics. The need for a (sub)evental approach to lexical semantics has by now been very well established (Ramchand, 2008, Levin and Hovav, 2009, Croft, 2012 and Schätzle (2019) represents a first proposal for extending linking theory to include information about subevental structure.

(6)
Control Raising syntax pro controlled Exceptional (f-structure) Case Marking a-structure argument controlled arguments unified (fusion) (raising) In fact, the extension of linking theory is quite straightforward and as Butt (2014) points out, there is a very natural parallelism between the syntactic phenomenon of control vs. raising and the types of documented argument composition. Butt (2014) goes through a number of different types of complex predicates and shows that there are two possibilities for argument composition: complete argument identification and argument "control" (cf. (2)-(3)). As summarized in (6), this is exactly parallel to the methods of argument sharing at f-structure: control and raising. The linking possibilities of arguments at a-structure to grammatical functions at f-structure follow straightforwardly. The only extension to the theory is the necessary recognition that complex predication exists and should be modeled via argument composition.

Theory vs. Computation
One point which is not very clear in the otherwise very through and well-researched contribution by Vaidya et al. (2019) is the difference between the computational and the theoretical worlds in LFG. Like LTAG, HPSG and CCG, one of LFG's guiding principles is that the framework should be formally rigorous and computationally implementable. However, there is also a strong sense in the community that theoretical advances in LFG should not be constrained by limitations on currently available computational implementations.
The discussion in Vaidya et al. (2019) includes both the linking analysis of complex predicates illustrated above as well as a treatment in terms of the Restriction Operator. The Restriction Operator (Kaplan and Wedekind, 1993) was invented partly in response to the complex predicate conundrum and was implemented as part of the crosslinguistic ParGram effort (Butt et al., 2002). Butt et al. (2008) explicitly compare and contrast Linking Theory as assumed and developed within the theoretical LFG community and the computationally implemented Restriction Operator with respect to complex predication.
The Restriction Operator operates primarily on f-structure categories, but makes reference to an underlying list of arguments that are contributed by each predicate. Consider the example in (7), also discussed by Vaidya et al. (2019). Under a Restriction Approach, the arguments of each predicate already come linked to grammatical functions, as shown in (8). This linking reflects an initial subcategorization frame of the predicates. The syntactic rules that allow for the combination of these predicates also take care of the correct linking. This is done by renaming the grammatical functions as determined by the type of complex predicate that is being formed (e.g., for the complex patterns involving causatives in Urdu/Hindi, see Butt and King (2006)). In our example, the work to be done is fairly simple and results in the linking in (9). The highest embedded argument (Arg1) is identified with the Arg1 of the light verb and as a consequence the "embedded" subject is identified as the matrix subject. Everything else stays the same.  Vaidya et al. (2019), nothing is ever deleted. Rather, the grammatical functions already linked to the arguments of the predicates are renamed. Even though it may not look like it, the Restriction Operator is wholly monotonic.

(8) Restriction Approach -Pre-Composition
Although the Restriction Operator is primarily concerned with the relabeling of grammatical functions, it does not so without recourse to argument structure. Within the computational world of the grammar development platform XLE (Crouch et al., 2017), the grammatical functions are linked to a pared down version of a-structure -each predicate comes with a list of numbered arguments whereby Arg1 is the highest (see Kibort (2007Kibort ( , 2014 for a similar proposal within LFG). The implementation also allows for a-structures to take a variable as an argument. This variable can be instantiated by the argument-structure of another predicate, allowing for a flexible and powerful yet constrained method of argument composition.

The Multidimensional Lexicon
In his commentary, Müller (2019) states that that ". . . lexical items are associated with f-structures and these f-structures are responsible for which elements are realized in syntax." However, this statement is not in line with the facts, even with respect to the Restriction Operator. As should be clear from the discussion above, the predicational power of lexical items is encoded primarily at the level of a-structure. This is true of the more theoretically-grounded Linking Theory and of the computationally-oriented use of the Restrction Operator.
It is also absolutely not true that f-structures are responsible for which elements are realized in syntax. First of all, f-structures are already the syntax -this is where part of the syntactic information is realized. Secondly, they are not directly responsible for which parts of the information is realized overtly in the c-structure (if this is what Müller means). F-structure information can correspond to an overt cstructural realization, but it need not (e.g., in the case of dropped arguments). Thirdly, LFG's multidimensional architecture means that its lexical representations are inherently multidimensional. Each lexical item (and morpheme) is associated with a range of information that includes information about part-of-speech (c-structure), subcategorization frames, functional information such as case, person, number, gender, tense and aspect (f-structure), predicational power and lexical semantic information pertaining to, for example, agency or affectedness (a-structure), the phonological realization (p-structure, cf. Bögel (2015)) and information pertaining to the clausal semantics (s-structure -this is currently often realized in terms of glue semantics; Dalrymple (1999)). This information is deployed as is appropriate for each of these representations, but is also all linked up together via the shared lexical entry. Butt et al. (2018) and Dalrymple and Nikolaeva (2011) represent some recent work on interface phenomena within LFG which nicely illustrates the deployment of a multidimensional lexicon for an articulation of analyses across several different components of grammar.

Conclusion
In conclusion, Vaidya et al. (2019) present an elegant and comprehensive analysis of N-V complex predicates for LTAG. The analysis involves positing an elementary tree for the nominal, nicely modeling the fact that it is the main predicational element of the clause. The light verb is adjoined as an auxiliary tree and this adjunction is required because the nominal elementary tree has no tense specification and the structure must acquire one. It does this via the light verb. The tree for the light verb also specifies relevant lexical semantic information contributed by the light verb. This contains established information that serves to constrain the combinatory possibilities for complex predication.
The analysis is comprehensive in that it takes into account the many different properties of Hindi N-V complex predicates that have been adduced over the years, e.g., in terms of scrambling, modification, agreement and lexical semantics. The LTAG and LFG analyses are similar in that they use sophisticated feature unification mechanisms for functional information associated with phrase structural nodes. However, they differ sharply in terms of the complexity of overall grammar architecture. LFG is inherently multidimensional, with a complex set of interacting yet distinct levels of representation. One of the components of grammar is a-structure, which is placed into correspondence with f-structure. A-structure is taken to be governed by its own set of constraints. Complex predication is accounted for via argument composition and identification and crucially allows for each member of a complex predicate to contribute arguments and further lexical semantic information to the overall joint predication.
This stands in contrast to the LTAG analysis developed by Vaidya et al. (2019), in which only the elementary tree of the nominal contributes arguments to the predication. The tree associated with the light verb is not devoid of semantic content: it encodes the relevant lexical information contributed by the light verb in terms of features that govern the combinatory possibilities within the complex predicate. This entails that several different elementary trees have to be posited for any given noun, in order to anticipate all the possible combinatory possibilities and attendent effects on the overall argument structure.
The design decision to concentrate all the information about number and type of arguments on the nominal leads to one of the points raised above, namely, that it is not clear how instances of complex predicate stacking can be dealt with without blowing up the lexicon considerably.
Overall it seems that analyses within LTAG are severely constrained in their analytical choices because of the fundamental design decisions taken in developing TAG. They lead to a pleasingly simple and elegant system, but result in a proliferation of stipulative lexical entries due to the combination of the heavy reliance on lexicalization and an absence of a separate representation of argument structure that can operate independently from phrase structural representations.
In contrast, LFG assumes a complex multidimensional and interacting architecture of grammar, which includes a separate level of astructure. 2 The advantages of such a complex system of interacting parts are clear: one can produce rich representations of linguistic structure and one has a number of competing analytical possibilties that one can choose among, depending on the precise nature of the data at hand. On the other hand, the disadvantages are also clear. Theoretical and computational linguists alike must struggle to understand and model a complex system. In this they receive comparatively less definitive guidance as to their analytical choices -these must be developed on the basis of an understanding and theory of the modular architecture of grammar. Phenomena like those of complex predicates provide just the right kind of object of study of interface phenomena, particularly across different theoretical frameworks.