Clustering similar objects with PostgreSQL

Feb. 15, 2018

Posted in:

CREATE TABLE transaction_transaction (
	date date not null,
	description varchar(255) not null
);

---

WITH lags AS (
	SELECT date, description, LAG(description) OVER (ORDER BY description) AS lag
	FROM transaction_transaction WHERE description != ''
	ORDER BY description
)
SELECT CASE WHEN
	(SIMILARITY(description, lag) < 0.2)
	THEN description
	ELSE '- ' || description
	END, date
FROM lags ;

Tags

Return to blog