Removing duplicate Postgres Keys

Everything else that doesn't fall into one of the other PB categories.
User avatar
RichAlgeni
Addict
Addict
Posts: 914
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Removing duplicate Postgres Keys

Post by RichAlgeni »

Ok, so I did something stupid when I was upgrading Postgres from 9 to 11. The restore ended with an error, so I cleared the database, and redid the restore. Except that I cleared the WRONG database! So, i ended up with two identical rows for each record. Anyway, it got me thinking that there must be a way to remove duplicate 'keys,' in the primary index. I found the CTID 'hidden' row, which shows the location of the row. I used that to delete the duplicate row:

Code: Select all

; process to loop thru tables, and remove duplicate keys

EnableExplicit

UsePostgreSQLDatabase()

Define result.i
Define thisCtid.s
Define thisKeyID.i
Define lastKeyID.i
Define psqlUpdate.s
Define deleteCount.i
Define thisKeyName.s
Define thisVersion.s  = "1.0.01a"
Define programeName.s = "remove_dup_keys.pb"
Define thisTableName.s
Define databaseNumber.i
Define pgSqlStatement.s

; create a structure for table key data

Structure tableKey
    tableName.s
    keyName.s
    ctid.s
EndStructure

; define the lists we will use

Define NewList tableKeyList.tableKey()
Define NewList ctidList.s()

OpenConsole(programeName)

; open the database

databaseNumber = OpenDatabase(#PB_Any, "host=andromeda port=5432 dbname=CAD", "postgres", "Th1s1sucks")
If databaseNumber

; first, get a list of the tables with their primary key names

    pgSqlStatement = "select table_name, column_name from information_schema.columns"
    pgSqlStatement + " where column_name LIKE '%_key' AND table_schema = 'public' AND is_nullable = 'NO'"
    pgSqlStatement + " order by table_name, column_name;"

    result   = DatabaseQuery(databaseNumber, pgSqlStatement)
    If result
        While NextDatabaseRow(databaseNumber)
            thisTableName = GetDatabaseString(databaseNumber, 0)
            thisKeyName   = GetDatabaseString(databaseNumber, 1)

; if the table name and key name are not null, add them to our list

            If thisTableName > "" And thisKeyName > ""
                AddElement(tableKeyList())
                tableKeyList()\tableName = thisTableName
                tableKeyList()\keyName   = thisKeyName
            EndIf

            PrintN(tableKeyList()\tableName + " - " + tableKeyList()\keyName)
        Wend
        FinishDatabaseQuery(databaseNumber)

; now select each table, ordered by the key

        ForEach tableKeyList()
            thisTableName = tableKeyList()\tableName
            thisKeyName   = tableKeyList()\keyName

            lastKeyID     = 99999999999
            deleteCount   = 0
            ClearList(ctidList())

            pgSqlStatement = "select " + thisKeyName + ", ctid from " + thisTableName + " order by " + thisKeyName + ", ctid;"
            result   = DatabaseQuery(databaseNumber, pgSqlStatement)
            If result
                While NextDatabaseRow(databaseNumber)
                    thisKeyID =   GetDatabaseLong(databaseNumber, 0)
                    thisCtid  = GetDatabaseString(databaseNumber, 1)

; if the keys are the same, add the ctid to a list, so we can delete it later

                    If lastKeyID = thisKeyID
                        AddElement(ctidList())
                        ctidList() = thisCtid
                    EndIf
                    lastKeyID = thisKeyID
                Wend
                FinishDatabaseQuery(databaseNumber)
            Else
                CloseDatabase(databaseNumber)
                PrintN("DatabaseQuery did not process: " + DatabaseError())
                Input()
                CloseConsole()
                End
            EndIf

; now, delete the duplicate rows by ctid, use a transaction, so we defer restraints

            ForEach ctidList()
                psqlUpdate = "BEGIN;"
                result     = DatabaseUpdate(databaseNumber, psqlUpdate)
                If result <> 0
                    psqlUpdate = "SET CONSTRAINTS ALL DEFERRED;"
                    result     = DatabaseUpdate(databaseNumber, psqlUpdate)
                EndIf
                If result <> 0
                    psqlUpdate = "DELETE FROM " + thisTableName + " WHERE ctid = '" + ctidList() + "';"
                    result     = DatabaseUpdate(databaseNumber, psqlUpdate)
                EndIf
                If result <> 0
                    psqlUpdate = "COMMIT;"
                    result     = DatabaseUpdate(databaseNumber, psqlUpdate)
                EndIf

                If result <> 0
                    deleteCount + 1
                Else
                    CloseDatabase(databaseNumber)
                    PrintN("Error deleting row from " + thisTableName + " where ctid = " + ctidList() + ", result = " + Str(result))
                    Input()
                    CloseConsole()
                    End
                EndIf
            Next

            PrintN("Deleted " + Str(deleteCount) + " from table " + thisTableName)
        Next
    Else
        CloseDatabase(databaseNumber)
        PrintN("DatabaseQuery did not process: " + DatabaseError())
        Input()
        CloseConsole()
        End
    EndIf
Else
    PrintN("Unable to open database: " + DatabaseError())
    Input()
    End
EndIf

CloseDatabase(databaseNumber)
Input()
CloseConsole()
End
percy_b
User
User
Posts: 72
Joined: Mon Jan 12, 2015 10:25 am

Re: Removing duplicate Postgres Keys

Post by percy_b »

Hi RichAlgeni,

This can also be done directly from SQL code (that you can embed in PureBasic). This works in Oracle and should probably work in PostGreSQL as well:

Code: Select all

DELETE FROM MY_TABLE 
WHERE ROWID NOT IN (SELECT MIN(ROWID) 
FROM MY_TABLE 
GROUP BY COL1, COL2, COL3);
Replace both occurrences of "MY_TABLE" with your actual table name. Also, replace COL1, COL2, COL3 with the columns that constitute the primary key or unique constraint.

Please try it out your self on a copy of the table first.
User avatar
RichAlgeni
Addict
Addict
Posts: 914
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Re: Removing duplicate Postgres Keys

Post by RichAlgeni »

I tried that, but since the primary key was duplicated (which shouldn't happen under normal circumstances), both were deleted. So I needed something unique, which was the system generated CTID column.
Marc56us
Addict
Addict
Posts: 1479
Joined: Sat Feb 08, 2014 3:26 pm

Re: Removing duplicate Postgres Keys

Post by Marc56us »

I tried that, but since the primary key was duplicated (which shouldn't happen under normal circumstances), both were deleted. So I needed something unique, which was the system generated CTID column.
ctid in PostgreSQL = Rowid in some other databases
so, try:

Code: Select all

DELETE FROM MY_TABLE 
WHERE ROWID NOT IN (SELECT MIN(ctid) 
FROM MY_TABLE 
GROUP BY COL1, COL2, COL3);
:wink:
User avatar
RichAlgeni
Addict
Addict
Posts: 914
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Re: Removing duplicate Postgres Keys

Post by RichAlgeni »

Part of the problem is that every user table I had, had duplicate rows. That's why I wrote the program.
Post Reply