Implementing postgresql full text with JPA entities

PostgreSQL Full Text Search capabilities are good enough, this guide will allow you to incorporate Full Text Search for your JPA entities using this.

Unaccent

Install unaccent extension in postgresl database.

postgres@pemehue:~$ psql mydatabase
psql (9.6.5, servidor 9.5.9)
Digite «help» para obtener ayuda.

postgres=# create extension unaccent;

Create class that implementes a Custom Dialect

You must pass this class to hibernate with properties or xml, in
play frameworks 1.4.x it’s jpa.dialect=util.jpa.PgFullTextDialect
in application.conf

package util.jpa;

import org.hibernate.dialect.PostgreSQL82Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.DoubleType;
import org.hibernate.type.ObjectType;

public class PgFullTextDialect extends PostgreSQL82Dialect {

	public PgFullTextDialect() {
		registerFunction("fts", new PgFullTextFunction());
		registerFunction("ts_rank", new StandardSQLFunction("ts_rank", DoubleType.INSTANCE));
		registerFunction("to_tsquery", new StandardSQLFunction("to_tsquery", ObjectType.INSTANCE));
	
}

Create a class that implements the function “fts” mentioned above.


package util.jpa;

import java.util.List;

import org.hibernate.QueryException;
import org.hibernate.dialect.function.SQLFunction;
import org.hibernate.engine.spi.Mapping;
import org.hibernate.engine.spi.SessionFactoryImplementor;
import org.hibernate.type.BooleanType;
import org.hibernate.type.Type;

public class PgFullTextFunction implements SQLFunction {

	@SuppressWarnings("unchecked")
	@Override
	public String render(Type type, List args, SessionFactoryImplementor sessionFactoryImplementor)
			throws QueryException {
		if (args.size() < 2) {
			throw new IllegalArgumentException("The function must be passed 2 arguments");
		}

		String field = (String) args.get(0);
		String value = (String) args.get(1);
		value = value.replaceAll("\\s+", " & ").replaceAll("'", "");
		value = org.apache.commons.lang3.StringUtils.stripAccents(value);
		boolean isLike = false;
		if (args.size() == 3)
			isLike = Boolean.valueOf(args.get(2).toString());
		String fragment = "";
		fragment += "to_tsvector(unaccent(" + field + ")) @@ ";
		if (isLike) {
			fragment += "to_tsquery('%" + value + "%')";
		} else {
			fragment += "to_tsquery('" + value + "')";
		}
		System.err.println("fragment:" + fragment);
		return fragment;

	}

	@Override
	public Type getReturnType(Type columnType, Mapping mapping) throws QueryException {
		return new BooleanType();
	}

	@Override
	public boolean hasArguments() {
		return true;
	}

	@Override
	public boolean hasParenthesesIfNoArguments() {
		return false;
	}
}

Using full text search samples

In this case using class User with property name


...

	@Test
	public void buscaConAcentos() {

		String jpql = "select c from User c where fts(name, 'Pérez López') = true ";
		Query q = JPA.em().createQuery(jpql);
		List list = q.getResultList();

		assertThat(list.size(), is(1));

	}

	@Test
	public void buscaSinAcentos() {

		String jpql = "select c from User c where fts(name, 'Perez Lopez') = true ";
		Query q = JPA.em().createQuery(jpql);
		List list = q.getResultList();

		assertThat(list.size(), is(1));

	}
...

Keep the text to index in a field

Sometimes is recommended to have a field with all the text to index, and keep it updated with JPA update an persist callbacks.

...
	@PrePersist
	@PreUpdate
	public void preUpdate() {

		String s = this.getNombre() + " " + nombreProfesionOVacio();
		this.search = Normalizer.normalize(s.toLowerCase().replaceAll("\\s+", " "), Normalizer.Form.NFD);
	}
...

Maintain tsvector precalculated and indexed

Instead of calculating tsvector on the fly, you can add a tsvector field on the
table. This field can be updated with a trigger, this is the sql code.


ALTER TABLE ident.user ADD COLUMN fulltext tsvector;

CREATE OR REPLACE FUNCTION public.user_vector_update()
  RETURNS trigger AS
$BODY$
BEGIN
    IF TG_OP = 'INSERT' OR  TG_OP = 'UPDATE' THEN
        new.fulltext = to_tsvector('pg_catalog.spanish', COALESCE(unaccent(NEW.search), ''));
    END IF;
    RETURN NEW;
END
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

CREATE INDEX full_text_index_user ON ident.user USING GIN (fulltext);
UPDATE ident.user SET fulltext = to_tsvector('pg_catalog.spanish', unaccent(search));

CREATE TRIGGER tsvectorupdate_user BEFORE INSERT OR UPDATE ON ident.user
FOR EACH ROW EXECUTE PROCEDURE public.user_vector_update();


In this case the function PgFullTextFunction must be redefined to use this precalculated field called fulltext and
not to tsvector it every time.

Replicación Postgrsql 9.6 on Ubuntu 16.04

Run this script in the replica server, be careful this will drop the database:

postgres@europa:~$ cat replica.sh 
#!/bin/bash

echo Stopping PostgreSQL
sudo service postgresql stop

echo Cleaning up old cluster directory
sudo -u postgres rm -rf /var/lib/postgresql/9.6/main

echo Starting base backup as replication
sudo -u postgres pg_basebackup -h my.master.server -D /var/lib/postgresql/9.6/main -U replication -v -P

echo Writing recovery.conf file
sudo -u postgres bash -c "cat > /var/lib/postgresql/9.6/main/recovery.conf <<- _EOF1_
  standby_mode = 'on'
  primary_conninfo = 'host=my.master.server port=5432 user=replication password=passreplicauser sslmode=require'
  trigger_file = '/tmp/postgresql.trigger'
_EOF1_
"

echo Starting PostgreSQL
sudo service postgresql start

Add this lines to the postgresql.conf of the replica server


wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 8 
hot_standby = on

default_statistics_target = 50 
maintenance_work_mem = 960MB 
constraint_exclusion = on 
checkpoint_completion_target = 0.9 
effective_cache_size = 11GB 
work_mem = 40MB 
wal_buffers = 8MB 
shared_buffers = 3840MB 
max_connections = 600 

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos necesarios están marcados *

Puedes usar las siguientes etiquetas y atributos HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>