Show HN: Regex.ai – AI-powered regular expression generator

AdieuToLogic · on March 28, 2023

> Regex.ai is an AI-powered tool that generates regular expressions.

Or, just write regular expressions?

> ... Regex.ai's intuitive interface makes it easy to input sample text and generate complex regular expressions quickly and efficiently.

See: https://www.ibm.com/topics/overfitting

Inputting the sample text:

  foo bar baz
  baz bar foo

And highlighting the first "baz" produced patterns which all had "[A-Z][a-z]*@libertylabs\\.ai" included, assumedly due to the default inclusions.

Removing those and highlighting the second "baz" resulted "<Agent B>" as the results in one case.

There is no explanation of any patterns generated. If a person is to use one of the generated patterns and Regex.ai is supposed to "save you time and streamline your workflow", no matter "[w]hether you're a novice or an expert", then some form of verification and/or explanation must exist.

Otherwise, a person must know how to formulate regular expressions in order to determine which, if any, of the presented options are applicable. And if a person knows how to formulate regular expressions, then why would they use Regex.ai?

anileated · on March 28, 2023

I often find it faster to write something from scratch rather than to work with someone else’s code to fix it. In the latter case I need to understand the intent, the whys behind the choices.

Well guess what, LLM-generated code is someone else’s code: an amalgamation derived from many peoples’ code. Except those people are ‘helpfully’ “abstracted away” from you by the middleman, so you can’t know their original intents and choices. What’s worse, it’s someone else’s code that will be treated as your code—unlike working with a legacy system that everyone knows was written by some guy, in this case any bugs will be squarely on you.

AdieuToLogic · on March 28, 2023

This offering, and the other half-dozen like it this past week or so, is like giving a kid a flamethrower.

It's all fun and games until they burn down your house.

> ... I need to understand the intent, the whys behind the choices.

As do I.

And that is something ChatGPT-X (for any given X) cannot provide, regardless of whether or not what is produced is correct. Perhaps with some form of backward chaining[0] a ChatGPT-X someday can explain how it arrived at what was produced works.

But "the why" is the domain of people.

0 - https://en.wikipedia.org/wiki/Backward_chaining

safety1st · on March 28, 2023

It's weird to see a forum for hackers, with hacker in the name, and with a line about encouraging curiosity in the charter, be so hostile to someone who hacked something together.

Sign of the times perhaps.

Though I guess it's not much different from the thread trashing Dropbox however many years back.

JTyQZSnP3cQGa8B · on March 28, 2023

>so hostile to someone who hacked something together.

It's not hostile but I'm a bit tired of all those projects that sprout around AI.

If it was an open-source project full of bugs, I would understand, and encourage and give solutions to the creator of the project, maybe even create tickets or fix bugs.

But with AI, we are flooded with tons of closed-source frontends to a closed-source backend, and those projects are more than buggy since they confidently give bad solutions. It's not like a "DIY electric car project," it's someone putting pieces of cardboard on a Tesla and pretending it makes it safer or faster.

I'm dumbfounded and I don't know how I am supposed to react to this, I would certainly not release that to anyone since it's antithetical to what I do and believe what software should be.

safety1st · on March 28, 2023

Good point. I wish OpenAI released more of their work as open source. I wish people building on top of them did too. That said, I usually won't begrudge a small-time developer or entrepreneur from choosing whatever licensing model they think is going to make them the most money. An army of small-time entrepreneurs who build closed source can still have democratizing effects on a market that's been captured by a few large companies. I'm more frustrated when I see big, entrenched companies finding ways to capture value from the open source ecosystem and privatize it.

My view on v1s, prototypes and PoCs regardless of their licensing is that by design they're going to be a mess and have errors, if they don't you waited too long to ship. Maybe these folks should have been a little more honest in their marketing but man if we're going to get into a list of the offenders on that front I think they are way way down on that list.

Overall in my view LLMs are the most disruptive thing to come along since the Web itself. Business model's like Google's are facing a direct challenge from this technology. Why do I want to look at Google's first page full of shitty search ads when I can use a LLM to get an answer immediately? As far as I'm concerned at this stage I would love to see a billion projects from every corner of the world built on top of this technology. Whether they're great or they're crap, the avalanche is the first real opportunity in many years to disrupt some giants.

AdieuToLogic · on March 28, 2023

> It's weird to see a forum for hackers, with hacker in the name, and with a line about encouraging curiosity in the charter, be so hostile to someone who hacked something together.

My comment was in direct response to an overarching concern raised by the implications of incorporating "LLM-generated code." This is relevant here due to the "Show HN" description above, which reads thusly:

  Regex.ai is an AI-powered tool that generates regular
  expressions. It can accurately generate regular expressions
  that match specific patterns in text with precision.

If you interpreted my characterization of "... like giving a kid a flamethrower" as being hostile, then I extend my apologies to the OP as I was using this phrase as a literary tool detailed subsequently. I thought the subject expansion of "the other half-dozen like it this past week or so" was sufficient.

As to "encouraging curiosity", I point you to feedback I provided to the OP in a reply peer to this one.

lynx23 · on March 28, 2023

Are you trying to say that every sort of criticism equals hostility? If I dont like your half-thought-out idea, I am hostile. If I praise it, I feel like an idiot. Not much choice remaining after all....

safety1st · on March 28, 2023

If I was trying to say that, I imagine I would have just said that

Kiro · on March 28, 2023

You have a choice of not replying at all.

lynx23 · on March 28, 2023

Sure, that seems to be the overall way forward. Keep quiet to avoid being attacked for speaking your mind.

anileated · on March 28, 2023

I’m not critical of the hack itself (unless it uses OAI’s closed commercial LLMs). Just not a fan of some implications of using it in real circumstances: it might work for a personal thing but if you use it for anything important you still need to know how regular expressions work.

distcs · on March 28, 2023

> It's weird to see a forum for hackers, with hacker in the name, and with a line about encouraging curiosity in the charter, be so hostile to someone who hacked something together.

I guess people are getting tired of too many topics in one narrow space. I come to HN for variety. It does get tiring when every single day I see yet another LLM-based solution attempting to solve a problem I don't think I even have.

Overdose of a certain topic is not good for a general tech forum like this. Everything should be in moderation and all that.

EGreg · on March 28, 2023

This forum is also against decentralization and Web3, and often shills for large centralized corporations. The ethos of hackers was always ANTI that stuff.

circuit10 · on March 28, 2023

You can ask it to explain why. It might not be a true representation of why those decisions were made but at least it’s a plausible explanation of why something could work like that which is better than nothing. I’m not sure why you think it can’t do that already?

blowski · on March 28, 2023

So if I look at most codebases, someone would be able to explain what all the code does and why it does that way? I'm extremely sceptical of that, even if I myself wrote the code 3 weeks ago.

manfre · on March 28, 2023

A person should be able to explain the code they're adding to a repo at the time they are adding it. Whether or not they can explain it at some arbitrary point in the future is a different question/issue.

column · on March 28, 2023

or you know, you can ask it to insert helpful comments

anileated · on March 29, 2023

Which you can’t rely on anyway.

cuuupid · on March 28, 2023

It's even worse. When working with someone else's code e.g. StackOverflow there's a reputation system gating people from the platform and incentivizing them to provide correct answers. You can reasonably expect that someone else's code has at least been thought through to some extent to solve the problem at hand, and very likely tested.

With LLM-generated code, especially ChatGPT-style decoder models, none of that is true. All of the posts and comments I see about it here seem to be anecdotes "it can do all of my job for me" yet asking it to write the simplest code creates several issues on my end.

Personally I think a model geared towards code generation isn't an unsolvable task; the Spider dataset was released some time ago (text to SQL task) and the winning approach there was no fanciness on the model side, but rather to just test all the output queries to ensure it's at least valid SQL. That got a 20%+ boost in accuracy.

Kiro · on March 28, 2023

Your experience is no less anecdotal than the millions of people who successfully use Copilot and ChatGPT to write code on a daily basis. I am one of those and can't imagine coding without Copilot or an equivalent ever again.

TeMPOraL · on March 28, 2023

> the millions of people who successfully use Copilot and ChatGPT to write code on a daily basis

Where did you get that number from? Are you saying that roughly one in a thousand person on Earth, alive today, is using Copilot and ChatGPT to write code on a daily basis?

ivanche · on March 28, 2023

Not the parent but it's not completely impossible. According to [0], there's about 25-30 million software developers in the world. If about 7-8% of them use ChatGPT and Copilot every day, it's already (two) millions.

[0] https://www.bairesdev.com/blog/how-many-software-developers-...

abecedarius · on March 28, 2023

I guess it's early for this to matter too much for the count, but people who are not "developers" have also used ChatGPT to write code. I've read anecdotes.

anileated · on March 28, 2023

Software exists over time. There is no “successful” unless you account for future bugs.

I do believe LLM code generators can be used with good results. I just know that for me that way is slower and more painful, because I need to switch between creative mode (when I make stuff) and debugging mode (when I need to figure out how someone else’s stuff works). I find keyboard typing speed is usually not what slows me down the most…

soiler · on March 28, 2023

I'm (genuinely) curious what kind of code you write. I haven't tried Copilot and I haven't used ChatGPT very much, but I feel I would be pretty surprised if either of them made significant improvements to my workflow.

Copilot I could see, since I already use Intellisense, autocomplete, and snippets to great effect. I'd be annoyed if I had to work without them. But in general, knowing what I want the code to do is >90% of the work of writing new code.

I feel there are a few possibilities for why I'm confused:

1. I'm not a very good software engineer, at least in certain respects. Maybe I should have a better understanding of architecture patterns or something I might have learned in a CS degree. Maybe I am hacking everything together and maybe I am already a slow coder.

2. I'm not [being] creative enough as a prompt engineer. I typically can't think of any way that ChatGPT could help me without ingesting my entire repo and figuring out the correct patterns. It could be, however, that there are ways to get the answers I need with better questions.

3. We do completely different kinds of work, and some kinds of coding are better suited for AI assistance than others.

zamnos · on March 29, 2023

The opposite of 1 is also possible. You're a really good programmer and know the material better, and just don't need to ask the kinds of questions that other people are asking ChatGPT (or stack overflow, or man pages) for/are happy with your current reference materials.

harvey9 · on March 28, 2023

Define successfully. You might verify what the LLM gives you, but lots of people who blindly copy and paste from stack exchange will do the same with chatgpt

cyanydeez · on March 29, 2023

Like autopilot in planes that fall back to experienced pilots, we're embarking on the most dangerous "uncanny valley" maneuver where these systems will be adopted by experienced pilots who know the limits but who will inevitably be followed by either no one or students whose conception is entirely synthetic.

At that point the plane AI better be 100% TRUSTWORTHY cause there's no safe fallback.

regexLL · on March 28, 2023

Thanks for your feedback! Updated ver 1.1 coming soon with more descriptions and better performance :)

AdieuToLogic · on March 28, 2023

If you have a choice between descriptions or performance, I humbly suggest detailed descriptions perhaps with links to tutorials and/or further reading. Who cares if the wrong thing is returned quickly if that means it lacks any context.

Also, consider how to express anchoring and/or grouping preferences in the UI or weighting based on highlight positioning. These are oft used features of regex languages.

barbariangrunge · on March 28, 2023

I’d you don’t understand regexes well enough to write them yourself, you should not get some ai to generate them for you. You won’t be able to verify whether they do what you want and the bugs can be subtle and destructive

soiler · on March 28, 2023

I read a few weeks ago here on HN about one large SAAS grinding to a halt because of a greedy selector in one line of regex. Not sure how people find old stories, it's lost to me now. But it was an excellent example of why regex is dangerous and requires a lot of care to write. I wouldn't trust an AI to write my regex unless I saw that people were finding it to be consistently better than they were are writing what they need.

jameshart · on March 28, 2023

You gave it an example where inferring the semantics you were after was basically a crapshoot. It’s not going to do well under those conditions. Nor will a random human who lacks insight into what specifically you are after. Did you want all the bazzes that are at the end of lines? The bazzes that follow bars? Who knows?

Try giving it examples where the data provides context cues.

qwertox · on March 28, 2023

Using your example and deselecting the email addresses I end up with these suggestions:

\b(foo|bar|baz)\b

\w(foo|bar|baz)\w

\bbaz\b

[fF][oO][oO]|[Bb][Aa][Rr]|[Bb][Aa][Zz]

It only lacks a dice button which randomly selects the "correct" answer.

6510 · on March 28, 2023

There are tools that somewhat explain what each part of a regex does.

qsort · on March 28, 2023

Just so that you know, your problem is called "regular expression synthesis", there's vast literature on it and a LLM is by no means necessary.

https://arts.units.it/retrieve/handle/11368/2758954/57751/20...

https://arxiv.org/pdf/1908.03316

https://cs.stanford.edu/~minalee/pdf/gpce2016-alpharegex.pdf

__lm__ · on March 28, 2023

The first one is available here: http://regex.inginf.units.it/

It uses genetic programming to build the regular expression.

nicolaslegland · on March 28, 2023

https://regex.inginf.units.it/ only needed 5 seconds to generate /(?<=Rd )\d++/ when I highlighted 9856, 10190, 9753 and 8883.

https://regex.ai/ was stuck with /9856|10190|9753|8883/ and confidently emitted /\d{4}/ as an alternative.

https://regex101.com/r/cAaV1z/1 confirms the former.

hackernewds · on March 28, 2023

and yet a decent regex generator has not existed before.

eviks · on March 28, 2023

This one isn't decent

nigamanth · on March 28, 2023

How so? Other than the UI what else can be improved?

danmur · on March 28, 2023

Well, I tried extracting fields from some logs I had lying around and I can tell you why I think it's not useful: 1. I select DEBUG and INFO, but it doesn't work out that there are WARNs etc in there and extract those too.

2. Some of the regexes are just.. wrong? I selected individual fields but there's one mangled regex that gives me two fields and the text in between, I didn't ask for that and it's no use.

3. None of the regexes could extract the date I selected (of the form 2023-03-28 05:23:28.844); some of the 'agents' used the literal date, the only one that broken it down into \d's didn't match anything because the DEBUG and INFO were mangled into there.

I'm not really sure how this would be at all useful in its current form?

JTyQZSnP3cQGa8B · on March 28, 2023

Well, I know nothing about AI and tried with simple variations of "foo bar baz."

The only solutions that worked were either "\w+ \w+ \w+..." which does not filter anything and may produce errors with other content, or "(first line|second line|third line)" which could be replaced by a bunch of if statements.

The other solutions were plainly wrong but at least they are honest about it and it's shown in the user interface.

For me it's more than useless and I get faster results with https://regex101.com/.

cowl · on March 28, 2023

How do I tell it to generate a regex for emails? Try selecting the emails. all four generated Regexes are wrong. Even if one of them was right? how do I choose between 4 choices if I Don't know the meaning? I have to verify the generated Regexes, Veryfing complicated regexes is much harder than writing them in first place.

roncesvalles · on March 28, 2023

My Google search results for "regex generator" returned a full page of decent ones.

florianfmmartin · on March 28, 2023

How about instead of an AI generating a regex we can't understand, we put energy using actually well developped method for parsing & validating text? Why put code you can't understand in your database?

For complex inputs, use actual peg parsers : https://docs.rs/peg/latest/peg/

For simplet inputs, express your intent with readable methods using a lib : https://github.com/sgreben/regex-builder/ & https://github.com/francisrstokes/super-expressive

b5n · on March 28, 2023

Once you know regex well enough to replace regex you realize that regex is pretty well developed.

There are certainly cases where different parsing methods/grammars are a better fit, but regex shines in many places.

column · on March 28, 2023

Reality check : there are people like my colleague who aren't software engineers and still have to occasionally maintain/create a regex in some corporate software config.

soiler · on March 28, 2023

That's even worse. They might not have the knowledge to realize the regex an AI gives them is bunk, or to debug it when it fails.

I'd like to see some numbers on a tool like this. If a huge majority of people are seeing genuine improvements in their workflow with it, I won't be a luddite yelling at them. Rare, low-severity failures shouldn't hold us back.

But the potential cost of failure with (any) regex is very high, so I personally wouldn't want to trust any remotely mission-critical to a person who doesn't understand regex well enough to write it themself, and if they can write it on their own that's often faster than debugging AI-generated regex.

thunky · on March 28, 2023

> How about instead of an AI generating a regex we can't understand

Would you feel better if it generated a regex-builder expression instead of a regex?

Even if regex-builder generates a regex under the hood?

In any case, the regex itself is only an implementation detail.

textread · on March 28, 2023

If you would like to generate a regular expression by giving an example input text and an example output match, you could use this closed form solution tool:- https://regex-generator.olafneumann.org/

There is an excellent HN comment that provides more reading material around regex generation:- https://news.ycombinator.com/item?id=32037544

elif · on March 28, 2023

I had a problem so I used an AI generated regex. Now I have an unknowable number of problems.

jedberg · on March 28, 2023

Usually when you have an AI like this that is supposed to generate verifiable results, you do an adversarial test where you ask it to solve problems that you already know the answer to, to make sure it works.

It looks like no one did that here. Even using the sample data provided, if you highlight a few of the addresses, it can't find the rest of them, mainly because it generates a regex with ST/AVE/LN in it, missing all the ones with RD. And if you add an RD sample, it just adds that to the list.

There's lots of great innovation coming with LLMs, but people are forgetting their "AI basics" when it comes to verifying them.

pyuser583 · on March 28, 2023

I really wonder if this sort of thing is the how AI will work.

We tell AI what we want. AI produces a hyper-specific, but barely comprehensible result. We look over the result to make sure it’s all good.

Then execute.

yawnxyz · on March 28, 2023

I just used ChatGPT to create a ton of permutations for product pricing that I'm putting on Stripe as products.

Except... it made ONE ERROR that I just spent two hours tracking down and fixing in my JSON file and now in the Stripe dash. (I coincidentally found the error using ChatGPT lol).

It's probably still faster and less error-prone than I could have done it manually. But it's still error-prone...

sacred_numbers · on March 28, 2023

The Reflexion paper (https://arxiv.org/abs/2303.11366) that came out recently shows how this kind of mistake might be overcome. Asking the model to think about the answer after it's generated a first draft greatly improves accuracy. Also, prompt engineering such as copying the generated code, pasting it in a new chat and saying "There's a bug in this code, please find it" can go a long way. There is so much low hanging fruit in harnessing the power of these models that is just being ignored because some even lower hanging fruit (RLHF, system messages, context window size, plugins, etc) is being released seemingly every few days.

yawnxyz · on March 28, 2023

That’s really interesting! It’s like treating this thing like a kid and telling them to really think about the answer and show the work. Weird!

Vespasian · on March 28, 2023

I recall reading that's partly because it doesn't have an inner monologue.

It only "knows" what it writes down and if you force it to print the intermediate step it can more accurately get to the final answer.

wolfi008 · on March 28, 2023

> Asking the model to think about

These models do not "think". This is a fundamental misunderstanding of how they work. It's not AI. It's not even language. It's just text inference.

pmx · on March 28, 2023

If you ask the model to "think" about something, and then it simulates that action and outputs what the result of that might be, does it matter if it's really thinking or not? Especially if the output is what we wanted originally?

I would suggest that a person saying "ask the model to think about" in this context in no way implies that that person is confused about the nature of the model, it is simply a convenient piece of language that helps us to achieve the desired result.

wolfi008 · on March 28, 2023

> does it matter if it's really thinking or not?

Yes, it does. Assuming GPT "thinks" is shaping public perception and that's already gone much too far.

silveraxe93 · on March 28, 2023

He did not say "make the model think about..." or implied the model is thinking. He simply and _correctly_ pointed that if you _ask_ the model to think it improves the answer.

It looks you just pattern-matched on the word _think_ and replied with a pre-made opinion about how AIs can't think. Ironic...

themodelplumber · on March 28, 2023

Maybe you are asking it to think and it's like, "this is 99% likely just code analysis requested"

So it's still fair to say you can ask it to think

globalise83 · on March 28, 2023

AI generates a comprehensive set of unit tests with correct and incorrect inputs, then we run the tests to ensure that they all pass.

tomashubelbauer · on March 28, 2023

I was curious if this would be smart enough to generate a regex for any four letter word so I copied the tagline of the site and highlighted all four letter words in it. (I have deleted the previous highlights of course.) It generated three regexes that just had a union of those words and one which started off good-ish by looking for any word of length of three or four, but then tacked on some random suffix and in the end this most promising regex turned out to not even match anything in the source text. As a suggestion to the authors of this tool I'd propose to add a step where any generated regexes that don't match anything in the input text are removed from the results.

gbro3n · on March 28, 2023

The results I got from this were unfortunately not useful. For example in trying to extract the property names from a connection string, I highlighted all of the property names along with the equals sign. So for

"PostgresSql": "Host=localhost;User ID=postgres;Password=xxxx;Database=test;Application Name=Test1234,Port=35432;Pooling=false;"

I selected User ID=, Host=, Application Name=, Password=

The results were pretty useless, using the literal inputs as pattern matches:

The following prompt given to Chat GPT 4 however:

> Write a regular expression to extract the property names from this PostgreSQL connection string: "PostgresSql": "Host=localhost;User ID=postgres;Password=xxxx;Database=test;Application Name=Test1234,Port=35432;Pooling=false;"

Yields the response with an explanation:

(?<=[^\\w])([A-Za-z ]+)(?==)

"This regular expression will match any sequence of alphabetic characters (upper or lower case) that are followed by an equal sign (=). The negative lookbehind (?<=[^\\w]) ensures that the property name is not preceded by another word character."

A quick test on regex101.com shows this works perfectly.

Sorry, don't like to be overly critical. Someone has attempted to solve a common problem for developers, but LLMs are going to blow applications like this away. And I think that Chat GPT at version 4 has become a truly useful tool.

I was interested to test this since I'd been writing a regular expression earlier in the day for a similar usecase, which I've written up here: https://journals.appsoftware.com/public/76/227/4221/blog/sof...

_andrei_ · on March 28, 2023

Related: https://github.com/pemistahl/grex

majastro · on March 30, 2023

My dude, how do I get in touch with you?!

_andrei_ · on April 5, 2023

check my profile ^_^

bastardoperator · on March 28, 2023

waldenyan20 · on March 28, 2023

Cool hack! I'm having some trouble thinking of a case where I wouldn't just explain to Copilot / ChatGPT what I need. Maybe specifically in cases where I had the raw data but not the column titles?

ouraf · on March 28, 2023

Going a little more "end user needs vs new tech offers", the intent is on the right place, but the output isn't helping as much as normal programmatic tools

At least for me, what would make this a killer app would be the ability of reading a document or pdf or big text dump and 1: identify "possible fields" (first name, date of birth), "probable fields" (middle name or other fields that are part of data set but doesn't appear in every line) and "probable junk data" (page numbers, page headers, useless pdf padding"

2: allow selection or tuning of these fields to generate regex to catch or remove only the data related to the parsed fields.

I THINK there's something done with pandas (pandoc?)that can help tearing a document apart and getting fields or basic doc structure, but AI would need to take it from here and present it in a clear, concise and optionally explained way so a busy office worker could just copy the regex filter in a spreadsheet formula or program function

libraryatnight · on March 28, 2023

This doesn't seem to generate great regex, but it does seem to generally work(ish?) so I guess nobody would care. That said, how's this work? Are you just sending this off to one of the AI api's - what's going on with the data pasted in the box after we hit run?

Struck me as funny when we have another thread going about people pasting company data into ChatGPT and here we have a regex AI with an example that looks like it's encouraging you to trust it with helping you regex through your PII, just paste it in the box and highlight what you need lol (not saying that's the intent, just that's what less savvy users may do)

Company site does not inspire much confidence: https://libertylabs.ai/

Light on details, heavy on philosophers, trend setters, idea banks, and radicals that make me worried I'm dealing with opportunists taking swings at monetizing a bunch of .ai domains. Especially the weird cinematic banner.

jen729w · on March 28, 2023

> Company site does not inspire much confidence: https://libertylabs.ai/

Ah c'mon. It's a bunch of kids -- which I say with envy not malice -- giving something new a go. Let 'em at it!

The products will speak for themselves. This one, meh, not so much. But we should be encouraging not disparaging.

danmur · on March 28, 2023

Prompt Engineer is such a pompous title. I think AI operator would be more accurate.

rpigab · on March 28, 2023

It's nice and there are use cases for it, but if I ever need something like it, I'll prolly just explain what I want to ChatGPT and tell it what regexp engine I'm using, and it'll give me results I'll paste to regexr.com for tests. The only added value here is that I wouldn't need to think of a prompt, but I've become good at finding nice prompts for programming problems, so directly querying ChatGPT is what I'd go for, personally.

Also, I'm not sure what underlying tech is used, and the only explanations on the tool seems to be a Youtube video, so I didn't look further. I'd like to know more about how it's made, if that's possible and something the author would be ok to share.

benrutter · on March 28, 2023

This is a really nice implementation, so full credit to the creator. Regex is always confusing to compose, but it's also one of those situations where I can't help but wonder if the solution is just to improve upon / provide a nice abstraction for regex rather than handing over full control to a non-deterministic AI.

I've seen at least 2 projects in the last 6 months using LLMs to generate bash code which seems like a similar solve. LLMs are super cool, but there's a massive advantage to actually understanding what you're code does, and LLM generated regex, bash, assembly etc loses that.

cutler · on March 28, 2023

Dang, there goes my investment in Jeffrey Friedl's "Mastering Regular Expressions" which launched my programming career after I discovered a reference to it in "Dreamweaver Bible" back in 2000.

politician · on March 28, 2023

Now you've got 3 problems?

https://blog.codinghorror.com/regular-expressions-now-you-ha...

bongobingo1 · on March 28, 2023

And a bill.

rvz · on March 28, 2023

and a solution still looking for and creating more problems.

AdieuToLogic · on March 28, 2023

Or just swoop in with Perl?

https://xkcd.com/208/

regexLL · on March 28, 2023

Great to see our service rank 2nd in Hacker News! Surprise by the sudden rise in traffic.

We will be deploying regex.ai v1.1 on the first week of April , with descriptions and 5x improved performance. Stay tuned!

alexnew · on March 28, 2023

To be honest I find ChatGPT sufficient for regex. I usually ask it for test cases that I can then validate in a regex playground to make the regex is working as expected.

laserbeam · on March 28, 2023

Exactly my thought. I can just go to GPT and ask it for regex related stuff. Why do I need a dedicated AI for that? I don't.

ekiauhce · on March 28, 2023

Writing regexps by hand, indeed, might be tedious task in some cases.

One I familiar with is to match datetime interval, when you need to narrow down log rows for a particular time range.

So I built a tool just for it :) https://github.com/ekiauhce/interval-to-regexp

mr_gibbins · on March 28, 2023

This looks like a great tool, I would like to ask if it really is AI-powered?

AI is a term that's been heavily co-opted by the expert systems crowd, especially when marketing solutions.

Does this tool actually learn from user input and improve output accordingly? That would be my definition.

Whichever way this was built it's a useful tool, thanks!

nice_byte · on March 28, 2023

even with the examples on the landing page, the regexes generated for the emails are not really usable. it needs way more examples to produce the right thing.

even though I doubt most production code uses the actual, correct, rfc-compliant regex to match emails (it's a monster), this does nothing to improve the situation...

nextaccountic · on March 28, 2023

It really need some zero-shot or few-shot magic from LLMs, or even heuristics to detect common patterns like emails, and just generate a sane regex, rather than stuff like [A-Za-z]{2,}@libertylabs\.ai which will obviously fail with a few more examples.

hk__2 · on March 28, 2023

It doesn’t make any sense to use a regex to check emails beside some very basic typos. Why do you need an email? To send messages to the user. Then do that: send a validation message and see if someone gets it.

beiller · on March 28, 2023

Finally I can use this to write a regex to parse HTML! (Glad to finally able to bust out the meme)

lando2319 · on March 28, 2023

Regular expressions, and various bash scripting techniques are where I've found chat GPT to be most useful

Just like normal software development you have to check the solution right away, but it really works great

clbrmbr · on March 28, 2023

The input isn’t rich enough to generate generalizable regexes (regecies?). Would probably be better off with a text input explaining what the user wants to match. Text-to-regex includes intent.

epaga · on March 28, 2023

Very useful tool for novices, though I'd assume experts will much rather write one themselves...

P.S. assuming the poster is scanning the comments, typo in the site title: "aritifical"

huqedato · on March 28, 2023

I built a better one, more flexible: https://huqedato.github.io/RegexAI

mmaunder · on March 28, 2023

How is this AI rather than just string logic? What model was trained? Trained on what? Maybe producthunt would be a better place to post promos?

mrcino · on March 28, 2023

Just as with every AI tool you need to be an expert in scope that you are operating the AI tool...

You have to know reg-ex really well to use this tool safely.

aradox66 · on March 28, 2023

Regexes are one of those things that are much easier to write good tests for than to write the code itself. I like this use of LLM

mikeponders · on March 28, 2023

I tried it with the default example, and it couldn't match only the street names. How does this work under the hood?

dboreham · on March 28, 2023

Coming soon: an AI powered regex explainer.

literallyroy · on March 29, 2023

The description here is a huge put off for me — no substance. Generated by AI probably?

zelphirkalt · on March 28, 2023

This might make people put more unnecessary regexes into code, because AI. Beware.

karmasimida · on March 28, 2023

How is this compare to GPT-4? I am using it to write regex all the time nowadays.

eviks · on March 28, 2023

The most basic check - that the suggestions are working - is missing

benatkin · on March 28, 2023

That would make it look broken when it's part of what it is. Why should it pretend to be something else? It's not an algorithm that produces a specific result.

Edit: I got a message saying there were too many requests. So much for not appearing broken. And I'm not using a VPN or anything so I'd appear as ordinary traffic.

lifewallet_dev · on March 28, 2023

Can it parse emails? Obviously that's hard.

freeone3000 · on March 28, 2023

The traditional answer is, of course, https://metacpan.org/release/RJBS/Email-Valid-1.200/source/l... . The real answer is “send an email to it and see if anyone gets it”

PsyNyde · on March 28, 2023

imo using chatgpt for generating regex is better, good project although

yellowsir · on March 28, 2023

who ate my space?

ar9av · on March 28, 2023

Honestly, writing a regex is way easier than reading a regex, no? So it feels like now I have the harder task of proving that the generated regex is correct.

lowefk · on March 28, 2023

Reading regex is much easier than writing them when using visualization tools such as https://regexper.com/, https://regex101.com/, and https://jex.im/regulex/, especially for beginners. I always use them to read regex.

JTyQZSnP3cQGa8B · on March 28, 2023

Or you can use "verbose regex" which some languages implement like in Python (https://docs.python.org/3/library/re.html#re.X). The spaces are ignored and you can add comments on each line. I used this in the past and my coworkers were happy about it because they could understand the regex and even modify it.

krisoft · on March 28, 2023

> Or you can use "verbose regex"

How did I not know about this! Thank you very much. This solves my biggest gripe with regex.

> my coworkers were happy about it because they could understand the regex and even modify it.

The most important point. Computers might read code efficiently, but if people can't reason about it, that is a recipe for bugs to sneak in.

Arbortheus · on March 28, 2023

Using regexper changed my life. I’m not afraid of even the most complex regexes anymore.

qrio2 · on March 29, 2023

https://www.debuggex.com/ is my favorite

pygy_ · on March 28, 2023

For JS/TS, I’ll add https://github.com/compose-regexp/compose-regexp.js to that list.

yawnxyz · on March 28, 2023

I consider myself a product designer so this is absolutely not true for me. Every time I try to write a Regex I have no idea how to even start. Copilot has been really good at starting me off and then I’ll take it to a regex site and understand it

foobarbecue · on March 28, 2023

I feel this is also true of code in general.

Writing is easy; reading is hard.

That's why LLMs aren't much help to me -- they just increase my workload by giving me more code to read and review. If I write it myself, I already know what it means, so that saves time and effort.

fhd2 · on March 28, 2023

I find this mostly pays off in debugging: Having written code usually means I know it better than code I've reviewed, which I know better than code I've never seen. Finding a weird bug in code I know well is a _lot_ easier.

kuboble · on March 28, 2023

For me writing a regex is easy only if I remember the syntax, which I never do because they differ between languages and I only need them once a month or so.

For me the fastest way is to ask generator to create a valid and not necessarily correct regex, so that I can tweak it. I successfully used gpt for just that recently. It even got the capture groups right.

qsort · on March 28, 2023

I agree. I similar arguments ("just write examples") a lot, and I really don't get that. Finding a comprehensive set of examples for code, regexps, shell, whatever is very, very hard.

benrutter · on March 28, 2023

This feels like a really cool idea for a tool - I would 100% use something that generates matching strings to a regex expression for checking my own or understanding other people's regex.