Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Transpile Any SQL to PostgreSQL Dialect (gitlab.com/dalibo)
189 points by fljdin on March 18, 2024 | hide | past | favorite | 45 comments


I see a lot of these types of tools, each with some essential support, but usually they don't work. I would appreciate what is support what is not.

Tried this tool: go run gitlab.com/dalibo/transqlate@v0.1-beta.2

select TRUNC(SYSDATE, 'month') from dual

-- TRANSLATION ERROR at +1:8: unsupported date format: "month" rule="replace trunc() by date_trunc()" select DATE_TRUNC('month', LOCALTIMESTAMP)exit status 1

Also, at least, it would set the latest git tag, currently command from Readme.md doesn't work:

go run gitlab.com/dalibo/transqlate@latest

go: gitlab.com/dalibo/transqlate@latest: module gitlab.com/dalibo/transqlate@latest found (v0.0.0), but does not contain package gitlab.com/dalibo/transqlate

latest should point to v0.1-beta.2


Thank you for your feedback, both issues have been added to the project tracking.


I asked Google Gemini to do this, and it gave me a seemingly convincing response. Anyone know if it's right?

Prompt:

  translate the SQL statement "select TRUNC(SYSDATE, 'month') from dual" to a PostgreSQL dialect

Response (formatting removed):

  Here is the equivalent SQL statement in PostgreSQL for "select TRUNC(SYSDATE, 'month') from dual" in Oracle:

    SELECT DATE_TRUNC('month', CURRENT_DATE) AT TIME ZONE 'UTC'

  Use code with caution.

  In PostgreSQL:

    DATE_TRUNC is the function used to truncate a date value to a specified unit.

    CURRENT_DATE is a function that returns the current date.

    AT TIME ZONE 'UTC' is used to specify that the truncation should be done in Coordinated Universal Time (UTC). This is optional, but it's a good practice to be explicit about the time zone to avoid any ambiguity.

  The dual table is a dummy table specific to Oracle that is used in simple SQL statements. 

  PostgreSQL doesn't require a table reference in this case.


I view a comment like yours as essentially spam and of negative value. I'm not sure if we (vaguely, hn forum community) have agreed on the etiquette here, but I have to ask why you would bother sharing an AI-generated response if you're not sure if it's correct. What are you getting at? How could this be helpful?


I am often picky about my/our time being wasted by low-effort comments but in this case I actually was myself coming into this thread assuming the actual tool being discussed was itself going to be an LLM-based solution; and, when it wasn't, all I was thinking was "aren't we in the future? isn't this just something people are going to ask an AI to handle now?".

Hell: a friend of mine claims he asked Claude (using some plug-in in his IDE) to port his app from Firebase all the way to PostgreSQL and it just did it... and it mostly worked well enough to save him a ton of time typing and looking up syntax for stuff, and I think it is now in production?

I do agree, though--and maybe this was your core point--that maybe this "let's ask Gemini" comment should be left by someone who would also know if the answer was correct rather than just saying "no clue but". Although, honestly, even having the output seems useful here (and again: I am someone constantly unhappy about people asking questions they can Google and other such filler content).


I agree: the comment was on-topic, included technical information (dual table?), and AI accuracy/usefulness is of interest to most readers. Maybe in a year I'll think such a comment is bad, but not today.


> a friend of mine claims he asked Claude (using some plug-in in his IDE) to port his app from Firebase all the way to PostgreSQL and it just did it

In now interested in your friends setup. Do you have more details?


"I’ve been using Phind + their VS Code plugin" ... "I got them to add Claude 3 Opus support"


I think it's really reasonable. When I was going through the readme I was (half heartedly) interested in seeing an OPENAI_KEY config variable in the readme.


Apologies, I was trying to imply that LLMs may be more practical to solve these kinds of problems than hand-built translators (assuming the primary purpose is for engineers using this during development.)


A transpiler will be consistent and can be correct.

An LLM based solution can be correct but won’t be consistent.

I would much rather a tool be consistent and always incorrect in the same way than inconsistent and sometimes correct. I can create test cases and correct the underlying code. I can’t do that with an LLM


> A transpiler will be consistent and can be correct.

No disagreement. My point is that it's impractically hard to build especially given the scope and complexity of the SQL dialects of major databases.


Looks like `AT TIME ZONE 'UTC'` should not be used, but I don't think it's illegal syntax.

https://www.postgresql.org/docs/current/datatype-datetime.ht...

> PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities...

> Although the date type cannot have an associated time zone, the time type can.

(emphasis mine)

> To address these difficulties, we recommend using date/time types that contain both date and time when using time zones.


Interesting, a while back at $EMPLOYER while working in a PoC, I found myself in a situation where I need to take user-provided Postgres SQL queries and run them against one of our APIs. Roughly, I converted the API response to a `pandas dataframe`, I then parsed and transformed the query from the postgres dialect to the duckdb dialect using `sqlglot` and used `duckdb` to query the `pandas dataframe`, coverted it into `json` and returned it to the user.


Evidence does something similar - dumping data to parquet:

https://docs.evidence.dev/core-concepts/data-sources

https://news.ycombinator.com/item?id=35645464

I whish the implied ETL step was even clearer from the homepage - it's not really feasible for us to dump entire tables to the dev machines for working with production data - but it is an interesting concept.


That's a really creative way to solve that problem! I would have spun up a temporary pg instance etc, but this is much nicer.


That's interesting. I'm looking to do something similar, but need wire compatibility with postgresql. This allows any postgres client to talk to our service. Didn't have a lot of luck finding a good "middleware"


I think this is probably the missing piece for you? https://github.com/jwills/buenavista


Thanks. This looks promising!


you could consider hosting an empty postgresql database, compile your code as a postgresql foreign data wrapper and expose it as a view. Nothing is more compatible with the postgres wire protocol than postgresql itself ;)

turbot compiles their steampipe plugins in this way. Example: https://github.com/turbot/steampipe-plugin-net


That's a good idea and we considered FDW (for this and more stuff), but having a middleware makes it more flexible - FDW has limitations around pushdown with subselects, we are still constrained to a single postgresql instance for execution, when in theory we could parallelize (certain) queries across nodes.


https://www.jooq.org/translate/

JOOQ also do that, but it is in JAVA.


The `transqlate` project aims to extend this kind of tool without limitation and under open-source license.

We're studying JOOQ from afar, with the ambition of getting closer to its functional coverage.


Recommend checking out https://github.com/tobymao/sqlglot if you are interested in this capability for other SQL dialects

Tools like this are helpful for:

- Rendering SQL in a consistent way, eg for snapshot testing

- Testing SQL business logic in CI against a dialect with less heavyweight dependencies

- Applying AST transformations to take advantage of dialect-specific optimizations


And column level lineage!


I've been looking for a library that can convert query plans, not just ASTs, between Postgres and other query engines. The reason this is useful is because RLS (row level security) settings are injected into the query plan after the AST is constructed, so the same query string can produce different query plans depending on the security settings of the database.

Substrait comes close, but I haven't found a Postgres Substrait producer. Are there any projects working on this?


`transqlate` is such a good name - I am very jealous, well done!


I see that the support for Oracle is mentioned and clearly present in the source code, is that also at present the only supported database? Any hope for transactSql (Microsoft Sql. Server)


Transact-SQL support (both queries and stored procedures) for PostgreSQL is implemented in Babelfish extensions [1]. It is implemented from the ground up as a "pl/tsql" language for Postgres [2], without using SQL-to-SQL transpilation like JOOQ.

[1] https://babelfishpg.org/ [2] https://github.com/babelfish-for-postgresql/babelfish_extens...


From the "Roadmap" section of the README:

> more source dialects : mysql, mssql, sybase, etc.

https://gitlab.com/dalibo/transqlate#roadmap


Many tools try to achieve similar eg. https://aws.amazon.com/rds/aurora/babelfish/

One company, CompilerWorks actually had tools that transpiled many SQL dialects. They were bought by Google https://www.crunchbase.com/acquisition/google-acquires-compi...


How is it different from sqlglot?


Related:

PRQL (higher-level abstraction language for SQL)

https://prql-lang.org/

HN Article, PRQL:

https://news.ycombinator.com/item?id=36866861

Google search, site: GitHub, "SQL Transpiler":

https://www.google.com/search?q=site%3Agithub.com+%22SQL+Tra...

GitHub general list of open-source Transpilers:

https://github.com/topics/transpiler


I wondered why this sort of thing didn’t already exist. I had to develop for IBM Db2 (for i Series) on a Linux environment (using MariaDB) due to the licensing that stopped us from having a test env.

I wound up writing a very ad-hoc C++ program that would parse a base SQL file and generate the appropriate DB2 and MariaDB dialect versions. Not flexible enough for reuse but it got the job done.


I think Apache Calcite can do this, or something similar.

https://calcite.apache.org/


This in combination with [pg_query](https://github.com/pganalyze/libpg_query) could allow for writing generic static analyzers.


I hope for the day such tool will convert Oracle PL/SQL to Postgres


This is one of the major challenges the tool aims to meet! We hope to achieve reliable results this year.

https://gitlab.com/dalibo/transqlate/-/issues/20


Given how many SQL features are RDMS specific, I doubt the claim of being able to convert any SQL into PostgreSQL dialect.


Nice. Anyone know if similar exists going the opposite direction? Specifically thinking of postgresql -> oracle sql.


This is interesting.

Is there any standardized AST for SQL?


The SQL standard nominally defines one (I don't know if it hands it to you on a silver platter as some sort of grammar but it certainly defines one one way or another), but if you just implement that and then send your code out into the world, you're going to be disappointed with the results. All the database have deviations and extensions of their own and people use them frequently.


A CLI tool to transpile SQL snippet from a dialect to another using an AST


Interesting


[flagged]


No one trusts these LLMs enough to plug sql that comes out of it directly into a compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: