Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability

Rocha, Thiago Alves

Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability

Detalhes bibliográficos
Ano de defesa:	2019
Autor(a) principal:	Rocha, Thiago Alves
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Não Informado pela instituição
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Formula synthesis Grammatical inference Ehrenfeucht–Fraïssé game
Link de acesso:	http://www.repositorio.ufc.br/handle/riufc/51208
Resumo:	In this work, we investigate the problem of synthesis of first-order sentences from samples of classified relational structures. In other words, we investigate the following problem: for a fixed class of relational structures, given a sample of classified structures, find a first-order sentence of minimum quantifier rank that is consistent with the sample. We consider the following classes of structures: monadic structures, equivalence structures, disjoint unions of linear orders, and strings represented by finite structures with a successor relation. We use results of the Ehrenfeucht–Fraïssé game on these classes of structures in order to design an algorithm to find such a sentence. For these classes of structures, the problem of determining whether the Duplicator has a winning strategy in an Ehrenfeucht–Fraïssé game is solved in polynomial time. We also introduce the distinguishability sentences, which are sentences that distinguish between two given structures. We define the distinguishability sentences based on necessary and sufficient conditions for a winning strategy in Ehrenfeucht–Fraïssé games. Our algorithm returns a Boolean combination of such sentences. We also show that any first-order sentence is equivalent to a Boolean combination of distinguishability sentences. Finally, we also show that our algorithm’s running time is polynomial in the size of the input. Since general first-order sentences are hard to read, we define a quantifier-free normal form (QNF) over the classes of structures we are considering. QNF sentences are defined over a richer vocabulary such that atomic formulas are an abbreviation of general first-order sentences over a standard vocabulary. Then, QNF sentences consist of Boolean combinations of such atomic sentences over this non-standard vocabulary. Moreover, we define a DNF version for QNF sentences. Then, given a sample of strings and the number of disjunctive clauses, we investigate the problem of finding a DNF formula that is consistent with the sample. We show that this problem is NP-complete and we solve it by a translation into Boolean satisfiability (SAT). We also present an extension of this problem that is robust concerning noisy samples. We solve this generalized version by a codification into the maximum satisfiability problem. Solutions to the problem of finding a QNF sentence in DNF such that the number of clauses is bounded may have a large number of literals per clause. Therefore, we consider a variation of this problem in which the maximum number of literals per clause is also given as input. This is essential since sentences with few clauses and few literals per clause are more compact and easier to interpret. Again, we show that this problem is NP-complete, and our approach for solving it is based on a reduction to the SAT. We also present extensions of these problems that are robust concerning noisy samples. In this case, a sentence may not be consistent with the input sample. We cover two approaches to deal with noisy samples. In the first approach, we consider a problem in which the goal is to find a sentence that classifies the maximum number of strings correctly. We solve this generalized version by a codification into the maximum satisfiability problem (MaxSAT). In our second approach, the goal is to find a sentence such that it does not correctly classify at most a given number of strings. We show thatthis problem concerning a limited number of errors is also NP-complete. Moreover, we give a SAT-based solution to this problem. Among the classes we are considering, strings are more appealing since they may be used to model text data, stress patterns in human languages, biological sequences, and sequences of symbolic data in general. As first-order logic over strings defines exactly the class of locally threshold testable (LTT) languages, our results can be useful in grammatical inference when the goal is to find a model of an LTT language from a sample of strings. In the field of grammatical inference, one of the main problems studied is the task of finding a language model consistent with a given sample of strings.

Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability

Registros relacionados