Implementing postgresql full text with JPA entities

PostgreSQL Full Text Search capabilities are good enough, this guide will allow you to incorporate Full Text Search for your JPA entities using this.

Unaccent

Install unaccent extension in postgresl database.

postgres@pemehue:~$ psql mydatabase
psql (9.6.5, servidor 9.5.9)
Digite «help» para obtener ayuda.

postgres=# create extension unaccent;

Create class that implementes a Custom Dialect

You must pass this class to hibernate with properties or xml, in
play frameworks 1.4.x it’s jpa.dialect=util.jpa.PgFullTextDialect
in application.conf

package util.jpa;

import org.hibernate.dialect.PostgreSQL82Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.DoubleType;
import org.hibernate.type.ObjectType;

public class PgFullTextDialect extends PostgreSQL82Dialect {

	public PgFullTextDialect() {
		registerFunction("fts", new PgFullTextFunction());
		registerFunction("ts_rank", new StandardSQLFunction("ts_rank", DoubleType.INSTANCE));
		registerFunction("to_tsquery", new StandardSQLFunction("to_tsquery", ObjectType.INSTANCE));
	
}

Create a class that implements the function “fts” mentioned above.


package util.jpa;

import java.util.List;

import org.hibernate.QueryException;
import org.hibernate.dialect.function.SQLFunction;
import org.hibernate.engine.spi.Mapping;
import org.hibernate.engine.spi.SessionFactoryImplementor;
import org.hibernate.type.BooleanType;
import org.hibernate.type.Type;

public class PgFullTextFunction implements SQLFunction {

	@SuppressWarnings("unchecked")
	@Override
	public String render(Type type, List args, SessionFactoryImplementor sessionFactoryImplementor)
			throws QueryException {
		if (args.size() < 2) {
			throw new IllegalArgumentException("The function must be passed 2 arguments");
		}

		String field = (String) args.get(0);
		String value = (String) args.get(1);
		value = value.replaceAll("\\s+", " & ").replaceAll("'", "");
		value = org.apache.commons.lang3.StringUtils.stripAccents(value);
		boolean isLike = false;
		if (args.size() == 3)
			isLike = Boolean.valueOf(args.get(2).toString());
		String fragment = "";
		fragment += "to_tsvector(unaccent(" + field + ")) @@ ";
		if (isLike) {
			fragment += "to_tsquery('%" + value + "%')";
		} else {
			fragment += "to_tsquery('" + value + "')";
		}
		System.err.println("fragment:" + fragment);
		return fragment;

	}

	@Override
	public Type getReturnType(Type columnType, Mapping mapping) throws QueryException {
		return new BooleanType();
	}

	@Override
	public boolean hasArguments() {
		return true;
	}

	@Override
	public boolean hasParenthesesIfNoArguments() {
		return false;
	}
}

Using full text search samples

In this case using class User with property name


...

	@Test
	public void buscaConAcentos() {

		String jpql = "select c from User c where fts(name, 'Pérez López') = true ";
		Query q = JPA.em().createQuery(jpql);
		List list = q.getResultList();

		assertThat(list.size(), is(1));

	}

	@Test
	public void buscaSinAcentos() {

		String jpql = "select c from User c where fts(name, 'Perez Lopez') = true ";
		Query q = JPA.em().createQuery(jpql);
		List list = q.getResultList();

		assertThat(list.size(), is(1));

	}
...

Keep the text to index in a field

Sometimes is recommended to have a field with all the text to index, and keep it updated with JPA update an persist callbacks.

...
	@PrePersist
	@PreUpdate
	public void preUpdate() {

		String s = this.getNombre() + " " + nombreProfesionOVacio();
		this.search = Normalizer.normalize(s.toLowerCase().replaceAll("\\s+", " "), Normalizer.Form.NFD);
	}
...

Maintain tsvector precalculated and indexed

Instead of calculating tsvector on the fly, you can add a tsvector field on the
table. This field can be updated with a trigger, this is the sql code.


ALTER TABLE ident.user ADD COLUMN fulltext tsvector;

CREATE OR REPLACE FUNCTION public.user_vector_update()
  RETURNS trigger AS
$BODY$
BEGIN
    IF TG_OP = 'INSERT' OR  TG_OP = 'UPDATE' THEN
        new.fulltext = to_tsvector('pg_catalog.spanish', COALESCE(unaccent(NEW.search), ''));
    END IF;
    RETURN NEW;
END
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

CREATE INDEX full_text_index_user ON ident.user USING GIN (fulltext);
UPDATE ident.user SET fulltext = to_tsvector('pg_catalog.spanish', unaccent(search));

CREATE TRIGGER tsvectorupdate_user BEFORE INSERT OR UPDATE ON ident.user
FOR EACH ROW EXECUTE PROCEDURE public.user_vector_update();


In this case the function PgFullTextFunction must be redefined to use this precalculated field called fulltext and
not to tsvector it every time.

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos necesarios están marcados *

Puedes usar las siguientes etiquetas y atributos HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>