Cover Page

Argument Mining

Linguistic Foundations

Mathilde Janier

Patrick Saint-Dizier

images

Preface

This book is an introduction to the theoretical and linguistic concepts of argumentation and their application to argumentation mining. It partly emerged from a course given at the ESSLLI summer school held in Toulouse in July 2017. Argumentation mining is now an important research and development activity. It can be viewed as a highly complex form of information retrieval. Argument mining addresses compelling application needs like those posed by complex information seeking like reasoning for or against a controversial statement, e-debate or getting to know the different aspects of a complex issue.

Argument mining has its roots in information retrieval and question answering, with a higher theoretical and practical complexity. In particular, arguments are complex natural language constructs, with several relational aspects and implicit elements, such as the premise–conclusion–warrant triad. Therefore, it is not surprising that argument mining requires high-level natural language processing technology. Argument mining requires either complex machine learning algorithms or lexical and grammatical systems, which can deal with the complexity of arguments considered in isolation and in a context with other arguments. The goal of this book is to introduce the reader to the main linguistic and language processing concepts.

Understanding how argumentation is realized conceptually and in language usually implies manually annotating argumentation components in various types of corpora. Annotations may then be used to develop linguistic data or to train learning algorithms. Annotation is in itself a challenge, because it addresses complex phenomena, which require much training to be accurately analyzed. The complexity of annotation tasks frequently entails relatively high disagreement levels among annotators and therefore the need to develop precise guidelines and methods to reach consensus.

Argument mining requires an accurate taking into account of a number of complex models and techniques from the theory of argumentation, linguistics, corpus analysis, natural language processing technology, machine learning, knowledge representation and reasoning. The software engineering aspects may also be complex, for example, to organize the different steps of a real-world system and to update it, including the management of language resources and the production of an adequate synthesis of the arguments and other textual elements that have been collected.

This book is conceived as an introductory book that can be used as a text book for undergraduate and graduate courses in linguistics, artificial intelligence or natural language processing. It may also be useful for practitioners aiming to undertake an argument mining project. It requires some basic background in linguistics, language and computer science. Most if not all the concepts of argumentation that are crucial for argument mining are carefully introduced and illustrated in a simple manner in this book. Programming samples are given in simple and readable logic programming forms. These samples can then be transposed to a large variety of other programming paradigms. Finally, a set of well-chosen references allow the reader to go beyond in different directions, either technical or conceptual. This book is therefore conceived to be accessible to a large audience of students and practitioners.

In this book, we show that linguistic analysis and natural language processing methods can efficiently and accurately be used to mine arguments. This book aims at presenting well-founded and concrete approaches, genre and domain-independent or delimited to a given domain, which can be deployed in applications. Evaluation methods provide means to measure the overall quality of a system or a resource used to train a system. This book also describes different approaches to annotate arguments in argumentative texts and debates either written or transcribed from oral exchanges. It discusses strengths and weaknesses of such approaches and provides criteria to choose an approach given application goals. Corpus annotation is frequently viewed as the basis that shapes a running system. This tasks is presented in detail in this book, after the presentation of the main conceptual and linguistic concepts of argumentation.

This book is organized into two parts. The first part, from Chapters 1–4, deals with the main conceptual notions of argumentation and argument structure in linguistics, cognitive science, logic and artificial intelligence. These areas are crucial for argument mining. Chapter 5 establishes a transition with the more technical details in Chapters 6–8, which constitute the second part of this book. These latter chapters discuss crucial aspects of the practice of bringing argumentation concepts into argument mining systems. Chapter 6 offers a detailed analysis of annotation practice, guideline elaboration and the use of annotation platforms. Chapter 7 presents a number ongoing systems. Since argument mining is still in an early developments stage, these systems are more proofs of concepts than real-world system. These systems, nevertheless, show the main challenges and possible solutions. Chapter 8 is an illustration of the concepts presented in the previous chapters. It shows, by means of some simple lexical and grammar descriptions, how to start an argument mining system. To conclude, Chapter 9 shows that argumentation is a complex process where the textual aspects presented in the previous chapters must be paired with a number of non-verbal elements such as sounds or images to allow for a real understanding of an argumentation process.

Each chapter is conceived to have a certain independence and the reader may skip those that are of less interest to him. For example, the reader can concentrate on the annotation chapter or on implementation issues, leaving aside the more theoretical considerations of the first chapters.

The reader must be aware that argument mining is still in a very early development stage. Developing a full-fledged system is still a mid- or a long-term research, which requires a lot of efforts from different disciplines. The elements presented here are those that have been stabilized and evaluated. This book contains a few elements that show the challenges still to be resolved to develop real argument mining systems and to make the results of such a mining process accessible to users.

We feel this book contributes to clarify and possibly to open new investigation and analysis directions in information retrieval in general, at the intersection of language, cognition and artificial intelligence. Argument mining covers many useful applications for our everyday life as well as more intellectual aspects of natural argumentation.

To conclude this preface, we would like to thank the French CNRS (Centre National de la Recherche Scientifique) for providing us with the adequate means and environment for fulfilling this work. We also thank very much Dr. Marie Garnier for an in-depth proofreading of the first part of this book and her questions, which helped improve the quality of the text. We would also like to thank to a number of close colleagues with whom we had joint projects or fruitful discussions. We thank, in particular, Drs. Katarzyna Budzynska, Chris Reed and Manfred Stede.

Mathilde JANIER
Patrick SAINT-DIZIER

July 2019

1
Introduction and Challenges

Argumentation is a language activity that occurs in an interactive context. It is based on an underlying set of schemes of thoughts, processes and strategies. This chapter introduces the notions of argument and argumentation, and the basic organization of an argumentative discourse. These notions will then be developed in more depth in the chapters that follow. This introduction to argumentation is oriented toward argument mining, which is the main topic of this book, therefore it is not a standard introduction to argumentation. References will allow readers to deepen their knowledge of the theoretical aspects of argumentation.

1.1. What is argumentation?

According to Aristotle, argumentation is the ability to consider, for a given question, the elements that are useful to persuade someone. Argumentation was, at that period, closely connected to rhetoric, which is defined as the art to persuade an audience. The ancient Greek argumentation and rhetoric were mainly designed for political decision making. This is why they are essentially oriented toward debates and judiciary purposes. After a long period during which rhetoric and argumentation were disregarded because they were considered as the art of trickery, in 1958 C. Perelman and L. Obrechts Tyteca [PER 58] contributed to a renewal of rhetoric and argumentation. These disciplines got a more scientific analysis. They were viewed as the development of discursive techniques that aimed at increasing an audience’s support for a given thesis. The approach was that the orator who is addressing an audience needs to take into account its values, opinions and beliefs.

In more technical terms, argumentation is a process that consists in producing articulated statements that justify a given claim. A claim C results in two standpoints: C and not(C). C can therefore be associated with justifications (supports) or contradictions (supports for not(C)). An argument is composed of at least two structures: a claim and a proposition that is a justification of the claim. Propositions (or statements) that justify the claim are called supports, while those which are against the claim or tend to disapprove it are called attacks. Supports as well as attacks can be more or less strong and direct with respect to the claim. A specific facet or part of the claim can also be attacked or supported, instead of the claim as a whole.

An important point to be highlighted when it comes to argumentation theory is the ambiguity of the word argument in English. D.J. O’Keefe [OKE 77] distinguishes between argument1, which refers to the reasons given for or against a point of view, and argument2 that has to be understood as equivalent to dispute1. There are also confusions between attacks or supports of a claim, sometimes called arguments. In what follows, an argument is a claim associated with a justification (or support) or an attack. Finally, the term argument also refers to the elements a verb or any predicate combines with such as a subject or an object. This sense of argument is not used in this book.

Claims can be almost any kind of proposition (or statement). They include forms such as thesis, judgments, opinions, evaluations, rhetorical questions, etc., whose goal is to put forward a debatable topic. Claims can be introduced by epistemic expressions such as I think, it seems, or performative verbs such as I pretend, I recommend and their nominalizations, which indicate that it is a personal position. A claim can be stated with a strong conviction or in a much weaker way as a possibility or a suggestion. Various operators express different levels of certainty, such as I am certain, I feel, it seems that. Such types of expressions are typical linguistic cues of a claim. Other forms of claims include evaluative expressions, as illustrated in example (1-1).

An argument is a complex structure composed of a claim and a set of propositions that (1) support or attack that claim or (2) support or attack other propositions in that structure, which are then considered as secondary claims. In this latter case, the aim is to reinforce the strength of the propositions related to the claim or to cancel out their effect, for example, via the attack of a proposition that supports or attacks the main claim. Supports and attacks define the polarity (i.e. for or against) of a proposition with respect to a claim. They suggest a bipolar analysis of arguments since only attacks or supports are considered.

For example, given the claim:

(1-1) Vaccination against Ebola is necessary,

statements such as:

(1-1a) Ebola is a dangerous disease,
there are high contamination risks
,

are analyzed as supports, while:

(1-1b) the vaccine adjuvant is toxic,
there is a limited number of cases and deaths compared to other diseases
,

are attacks.

These statements can be produced by either a single author or by several. Their strength is partly dependent on the context and on personal evaluations. Finally, the statement:

(1-1c) The initial vaccine adjuvant has been replaced by a much more neutral one that has no effect on humans,

is an attack of (1-1b); it cancels out the attack to the claim produced by the initial statement.

Beside the support and attack relations presented above, propositions may also attack the inference that connects two arguments:

(1-1d) Ebola is dangerous with high contamination risks, therefore vaccination is necessary,

is attacked by:

(1-1e) Recent epidemiological investigations show that vaccination does not stop disease dissemination.

Supports and attacks are crucial components of argumentation, which is based on the recognition of a difference of opinion between parties: these parties express doubts about the other party’s standpoint. A preliminary step is to identify differences of opinion and then the basis on which they can be resolved. Doubts may bear on a unique or on multiple ones, for example when the claim is complex or has multiple facets. In the above example (1-1b), the adjuvant is toxic attacks a facet of the vaccine, i.e. how it is diluted before injection. Other facets include specifically its costs and the way it has been tested on populations. Analyzing the structure of arguments and then evaluating them is the ultimate goal of argumentation. Argumentation is a complex discourse activity that must not be confused with a demonstration. Argumentation aims at convincing someone of a certain point of view, or at resolving conflicts. Argumentation relies on more shallow structures that in formal demonstration such as argument schemes, associated with norms, rules and constraints for which models are being defined.

An argumentation can be realized in a number of manners. For example, it can be a monologue, where propositions for and against a claim are developed, for example in a news editorial. In that case, the author attempts to present an overall picture of the different positions for or against that claim. He may also wish to anticipate attacks by readers. An argumentation can also occur in a dialog between two or more persons that express different points of view concerning a claim or a standpoint. This is, for example, the case of TV or online debates, deliberation and litigation situations.

Argumentation can therefore be oral or based on written elements produced using various types of media. Argumentation is mainly aimed at (1) convincing someone or a group of people of a certain point of view or (2) coming to a reasonable agreement between two or more parties about a disagreement (e.g. in mediation and deliberations).

When one argues for a given standpoint, he is the proponent of that standpoint. The actors that disagree and argue against it are called the opponents. Arguing is not demonstrating: a proponent presents good reasons to support a claim, he does not logically demonstrate that the claim is true. He simply gives good reasons that justify the claim. A demonstration, on the contrary, is based on facts, inference rules and axioms described in a formal language, whereas argumentation expresses facts and causal schemes in natural language. It follows the well-known statement uttered by a judge: I need proofs, not arguments!

It must be noted at this stage that argumentation is often contrasted with explanation. When one explains something to a listener, the aim is to bring new knowledge to that listener or to help him to modify his beliefs. This knowledge is hypothesized to be true and non-controversial, unless otherwise stated. An argumentation does not a priori bring any new knowledge to the listener: it is aimed at persuading him of the validity of a certain claim. However, in an argumentative discussion, it is frequent to have a combination of arguments and explanation. The difference between these two notions in a discourse is not easy to make: it depends on the knowledge and beliefs of the speaker and the listener.

According to several authors, an opinion is also a slightly different notion: it is a statement that is not supported by a justification. Arguments are necessarily supported by one or more justifications even if some of them are implicit. Such implicit justifications are called enthymemes.

1.2. Argumentation and argument mining

Argument mining is an emerging research area that introduces new challenges both in natural language processing (NLP) and in artificial intelligence (AI). Argument mining is a very challenging area that involves complex language resources and parsing processes as well as reasoning and pragmatic aspects. It is an analysis process that consists in automatically identifying claims and relevant propositions that support or attack these claims in dialogues or in texts found on various types of media. Then, argument mining must identify the structure and orientation of these propositions and the relations between claims and propositions. Identifying all of these features is necessary to reach an accurate automatic argument and argumentation analysis and to produce argumentation diagrams and synthesis. This is obviously a huge task that needs to be realized step by step.

Arguments, claims and associated propositions that act as justifications may take various forms in texts and debates. Because of the pragmatic nature of arguments, linguistic cues associated with these claims and their associated justifications are very diverse, ambiguous and may even be implicit. These structures may not be adjacent in a text and therefore require complex identification processes.

So far, most experiments and projects focus on NLP techniques based on corpus annotation in order to characterize their linguistic structure. The analysis of the NLP techniques relevant for argument mining from annotated structures is presented in Chapters 6, 7 and 8. AI aspects and related domain and general purpose knowledge representation aspects have not yet been given a lot of consideration because of their complexity and diversity. They will certainly be the subject of more research in the future.

Argument mining has a large number of application areas, among which:

  • – opinion analysis: beyond satisfaction levels, the objective is to identify why users or citizens are happy or unhappy;
  • – debate analysis, in oral or written form, and the detection of argumentation strategies;
  • – business intelligence via the detection of weak signals with arguments;
  • – decision making, paired with a decision theory;
  • – evolution of population value system analysis on the long term;
  • – analysis of specific strategies of argumentation: juridical defenses, pleads, mediation, deliberations, scientific or mathematical argumentation;
  • – detection of incoherence among sets of arguments and justifications, e.g. in juridical and technical documents.

Statements related to a given claim are difficult to identify, in particular when they are not adjacent to the claim, not even in the same text, because their linguistic, conceptual and referential links to that issue are rarely direct and explicit. As the reader may note it, argument mining is much more complex in general than information retrieval since an argument is a proposition.

Let us illustrate the difficulty to establish an argumentative relation between two utterances by means of an example:

Fact 1: The situation of women has improved in India,

Fact 2: Early in the morning, we now see long lines of happy young girls with school bags walking along the roads.

These two statements could be considered as pure facts, however, Fact 1 has the form of an evaluative expression, with the term “improved”, which may potentially lead to discussions and controversies. With some knowledge of the considerations that are underlain in Fact 1, it turns out that Fact 1 can be analyzed as a claim: Fact 2 is a proposition that supports Fact 1 and therefore Fact 1 is interpreted as a claim. The reader can then note that knowledge and inferences are required to make explicit, and possibly explain, the relationships between women’s conditions and young girls carrying school bags.

Let us now consider:

Fact 3: School buses must be provided so that schoolchildren can reach the school faster and more safely.

Fact 3 is a statement that attacks Fact 2, indeed: these young girls may not be happy having to walk to school in the early morning, but it is not an attack of the claim Fact 1: the facet that is concerned in the relation between Facts 3 and 2 does not concern women’s conditions in particular, but schoolchildren in general.

Additional statements that are supports or attacks for the claim Fact 1 found in various texts are, for example:

Supports:

(1-2a) increased percentage of literacy among women,

women are allowed to enter into new professional fields,

at the upper primary level, the enrollment increased from 0.5 million girls to 22.7 million girls.

Attacks:

(1-2b) there are still practices of female infanticide,

poor health conditions and lack of education are still persisting,

home is women’s real domain,

they are suffering the violence afflicted on them by their own family members,

women’s malnutrition is still endemic.

Most of these statements illustrate how difficult it can be to mine arguments related to a claim and to interpret them. Indeed, some domain knowledge is necessary.

It is interesting to see the diversity of propositions for or against a claim, as well as their origin and how participants in a debate evaluate them and find counterpropositions or restrictions to strong propositions put forward by other participants. Here are some examples of propositions found on various forums in relation with the claim:

(1-3a) The development of nuclear plants is a positive decision.

Supports:

(1-3b) nuclear plants allow energy independence,

they create high technology jobs,

nuclear risks are overestimated,

wastes are well managed and controlled by AIEA,

nuclear plants preserve the other natural resources.

Attacks:

(1-3c) there are alternative solutions to produce electricity with less pollution: coal, sea tides,

wind, etc., alternatives create more jobs than nuclear,

there are risks of military uses that are more dangerous than claimed,

nuclear plants have high maintenance costs.

Concessions (weak supports):

(1-3d) nuclear plants use dangerous products, but we know how to manage them,

it is difficult to manage nuclear plants, but we have competent persons.

In this latter set of propositions, the registers of job creation, high technology development and natural resource preservation are advocated in addition to the more standard propositions on pollution, national independence and maintenance costs. Comparisons with other sources of energy are also made, but in a relatively shallow way, which limits the strength and the impact of such propositions.

1.3. The origins of argumentation

Let us now give a few historical milestones. Argumentation has its origins in the Greek tradition, and probably also, in a different manner, in the Indian tradition. In Greece, argumentation seems to have been developed in parallel and in close connection with other major disciplines of this period, in particular geometry. Its origins are attributed to Tisias and Corax, and also probably to Aristotle (384–322 BC) and to sophists (5th Century BC). These latter philosophers had a very well-developed system of argumentation that allowed them to elaborate various types of critiques of the society in which they lived.

The Greek argumentation tradition had a lot of trends and schools. Of interest to our purpose is the Antiphonia, which was a game in which participants had to produce a counterdiscourse to a given discourse. This was an excellent exercise for students: any argument had to be transformed into a counterargument. The Greek tradition developed the notion of possible and probable and associated forms of paradoxes. From these notions, which constituted an abstraction of standard human behavior, emerged more contemporary notions such as prototypes, types and various forms of logics based on uncertainty. Finally, the tradition around Plato and Aristotle developed schemes of dialectic interactions and a critique of natural language as a means to establish forms of scientific truth. According to their analysis, natural language does not allow the demonstration of a scientific truth because it is not precise enough. Natural language is more appropriate for argumentation than demonstration.

The Greek tradition focused on rhetoric, viewed as the art of arguing. Rhetoric is based on typical language called figures of speech as well as on gestures, mimics and other features. Forms of rhetoric are detailed in Chapter 3. Briefly, an argumentation had to be structured according to the following global scheme:

  • – introduction;
  • – narration of facts, from a certain standpoint;
  • – argumentation (defense), with its codes and processes;
  • – refutation by opponents;
  • – conclusion, summary of the main points.

Rhetoric is composed of a form of logical reasoning (logos, the logical aspects of arguing) paired with two more subjective sets of attitudes: the ethos and the pathos, which aimed at touching the audience and producing positive affects in them in order to create a climate of confidence so that the claims could be easily accepted.

1.4. The argumentative discourse

A discourse is argumentative because of its internal organization. In argumentation, the term discourse covers different perspectives, among which (1) a local perspective, how an argument as a whole or one of its justifications, is embedded into a set of discourse structures, and (2) a global perspective, the organization of an argumentative discourse, for example to support a claim.

At a local level, an argument or its justifications are frequently associated with various sorts of restrictions that specify its scope. It may also be associated with elaborations and illustrations that make it easier to understand, and contribute to increasing its strength. Discourse structures involved at this level are those introduced by Rhetorical Structure Theory (RST) [MAN 88], which include discourse structures such as concessions, contrasts, elaborations, circumstances and conditions. The website – http://www.sfu.ca/rst/01intro/definitions.html – is particularly informative and gives a large diversity of relations with definitions and examples, for example, if we consider again the claim given in example (1-1), we may have a complex statement of the form:

(1-4a) Even if the vaccine seems 100% efficient and without any side effects on the tested population, it is necessary to wait for more conclusive data before making large vaccination campaigns. The national authority of Guinea has approved the continuation of the tests on targeted populations.

In this statement, the segment:

(1-4b) it is necessary to wait for more conclusive data before making large vaccination campaigns,

is the kernel, which attacks the claim (1-1) on vaccination, and the text portions before and after this text portion are discourse structures that insert the argument into a context and make it more explicit. This text portion can be tagged as follows:

<argument>

<concession> Even if the vaccine seems 100% efficient and without any side effects on the tested population, </concession>

<main arg> it is necessary to wait for more conclusive data before making large vaccination campaigns, </main arg>

<elaboration> the national authority of Guinea has approved the continuation of the tests on targeted populations</elaboration>

</argument>.

Restrictions on the scope of a justification may not be adjacent to the justification kernel but may appear, for example, at the beginning of a section or even in a title. These must be taken into account in the analysis of an argumentation since the restrictions they convey may change the way statements attack or support each other.

At a more global level, a discourse is analyzed as set of language acts that follow a precise organization called a plan. A discourse always has a goal that is reflected in the plan that is followed. Plans may be different for each of the main types of argumentative discourses since the aims are different. Argumentative discourses include different styles such as deliberative, judiciary, epidictic, exhortative, epistolary, advertising and propaganda.

1.5. Contemporary trends

Contemporary studies on argumentation are based on foundational works, among which those by J.L. Austin [AUS 62] and J.R. Searle [SEA 69] for their model of language acts, and works by H.P. Grice [GRI 75] for discourse models and cooperative principles.

Argumentation involves a number of areas in language, philosophy and cognition. First, argumentation is a mental process associated with a linguistic activity. Most argumentative statements have an effect on the listener or reader: they affect their thought and belief systems, possibly their psychological system. Argumentation includes norms that are followed more or less strictly. These norms allow an audience to decide whether an argumentation is sound and respects a certain balance between the proponent(s) and the opponent(s). Norms include features such as efficiency, accuracy and truth in order to avoid various forms of fallacies and trickeries. These norms are developed in the following chapter. Argumentation is also concerned with cooperativity principles: its aim is to construct a consensus around a claim. Argumentation includes identifying divergences between opponents and finding an acceptable compromise, for example, in the case of a mediation. Argumentation is a powerful means to develop critical thinking. From this picture, the reader can infer that argument mining and result evaluation is a very complex process that involves most of the resources of NLP and AI.

The recent trends in argumentation can be summarized in five main theoretical research directions:

  • – Pragma-dialectics, F. van Eemeren et al. [EEM 92, EEM 01]: argumentation is viewed as a type of dialog following strong norms. Argumentation is then considered as a means, via dialog, to resolve conflicts and to reach an acceptable consensus;
  • – Argumentation and Conversation, J. Moeschler [MOE 85] and E. Roulet [ROU 84]: this approach is the analysis of verbal interactions in an argumentation. Argumentation is associated with pragmatics and conversation. In their perspective, argumentation consists in discourse acts that must follow norms, rules and constraints;
  • – Pragmatics and Linguistics of argumentation, J.C. Anscombre and O. Ducrot [ANS 83] where new forms of rhetoric are integrated into pragmatics. A revision of the notion of argument within the fields of language semantics and pragmatics is proposed. Argumentative connectors and operators are investigated in depth: these all connect language acts that allow an audience to interpret utterances as arguments for or against a claim. Argumentation is considered as a language activity rather than a discursive process;
  • – Argumentation as a communicational act, J. Habermas [HAB 87]: this approach develops an ethics of argumentation and communicative actions;
  • – Logical Pragmatics of argumentation, J.B. Grize [GRI 90] aims at modeling natural logics and cognition within the framework of argumentation and communication. In this approach, arguing consists in modifying the beliefs and the representations of an audience.

Beside these trends, a number of authors have developed more specific analyses, among which are C. Plantin [PLA 96], L. Toulmin [TOU 03], D. Walton et al. [WAL 08], etc. The theoretical aspects of argumentation presented in this volume largely reflect the first two directions advocated above, namely pragma-dialectics and argumentation and conversation. They are, we feel, more central to the objectives of argument mining: identifying arguments, schemes, argument organization for example in a debate. The notion of argument scheme complements these two directions, since an important challenge of argument mining is to identify the schemes used in supports and attacks.

  1. 1 Throughout this work, the term “argument” is used as equivalent to “argument1”, while “argumentation” refers to the process of arguing.