Ask HN: What are some actual use cases of AI Agents right now?

Comments

By PheonixPharts 2024-02-1420:3313 reply

> I'd love to know what are some of the hidden challenges to making a useful product with agents?

One thing that is still confusing to me, is that we've been building products with machine learning pretty heavily for a decade now and somehow abandoned all that we have learned about the process now that we're building "AI".

The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has to be very high for it be productizable.

You can do handwritten digit recognition with 90% accuracy? Sounds pretty good, but if you need to turn that into recognizing a 12 digit account number you now have a 70% chance of getting at least one digit incorrect. This means a product worthy digit classifier needs to be much higher accuracy.

Go look at some of the LLM benchmarks out there, even in these happy cases it's rare to see any LLM getting above 90%. Then consider you want to chain these calls together to create proper agent based workflows. Even with 90% accuracy in each task, chain 3 of these together and you're down to 0.9 x 0.9 x 0.9 = 0.73, 73% accuracy.

This is by far this biggest obstacle towards seeing more useful products built with agents. There are cases where lower accuracy results are acceptable, but most people don't even consider this before embarking on their journey to build an AI product/agent.

By spenczar5 2024-02-1421:075 reply

> The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has to be very high for it be productizable.

I think that ChatGPT's success might be partly attributable to its chat interface. For whatever reason, a lot of people - including me! - are much more forgiving of inconsistencies, slip-ups, and inaccuracies when in a conversational format. Kind of like how you might forgive a real human for making a mistake in conversation.

I don't think that's necessarily good, and might not have much connection to attempts to build new non-conversational products on top of LLMs, but maybe it has some explanatory power for the current situation.

By dougb5 2024-02-1423:312 reply

I don't know if I'm more forgiving of inaccuracies in a conversational interface, but I'm way less likely to notice them in the first place. Especially since the current crop of RLHF'd models are so eager to please that they say nearly everything with high confidence.

By godelski 2024-02-1521:56

I think this is a more realistic notion of what AI danger is rather than the X-risk. It's our tendency to trust things given a certain means of presentation. There are certain things where error is okay, but many things where even a small error is huge (like the OP is mentioning). The danger is not so much in the tool itself but us using the tools in a lazy manner. It isn't unique to ML/AI, but ML/AI uniquely are better at masking the errors. It's why I dislike the AI hype.

By __loam 2024-02-150:081 reply

Yeah I think a lot of people think RLHF is a tool for increasing accuracy but it really is training it to be convincing.

By godelski 2024-02-1521:59

Which should be rather obvious if you understand that it's basically a GAN. You have a discriminative model who's objective function is based on Justice Potter's description of porn: I know it when I see it. If you ask what types of errors might emerge from such a formulation I think it becomes rather obvious.

Which isn't to say that the tools aren't useful. But I have to add this fact because many people conflate any criticism with being dismissive of the technology. But technology is about progress, not completion. Gotta balance criticism and optimism. Optimism drives you and criticism directs you.

By red-iron-pine 2024-02-1518:24

> I think that ChatGPT's success might be partly attributable to its chat interface. For whatever reason, a lot of people - including me! - are much more forgiving of inconsistencies, slip-ups, and inaccuracies when in a conversational format. Kind of like how you might forgive a real human for making a mistake in conversation.

The key term here is "conversation". If I query something from the machine and it disappears and rumbles and then prints off something like a 1980s mainframe, with paper that has those holes on the side that you tear off... and then it's wrong, it's wasted time.

Meanwhile with the conversation I'm watching it in real time, and can stop it, refine it, or ask or clarification immediately and effectively. There is an expectation of give and take and "talking through" things to get to an answer, which I find is effective. I don't need it to be 100% right all the time, just 80% and then start parsing answers out of it to refine it to 90% accuracy with high confidence.

By muzani 2024-02-156:09

Personally having been a big fan of GPT-3, I was quite against ChatGPT because of this.

Completion models are obviously wrong very often. Instruct model was kinda ok, but you know it's a dumb machine.

Chat was a bit of an uncanny valley. I treated the instruct model like a child, but chat felt like having a conversation with someone of 80 IQ. It felt frustrating, and you ended up going "no no no, what I meant WAS ..." It felt like dealing with an incompetent colleague.

But I guess there's lots of views on it. Some expected it to be an oracle, even a god. Some treated it like Stack Overflow, then got frustrated that it was giving poor quality answers to poor quality questions. Some were just abusive to it. I suppose it's a mirror in a sense.

By emodendroket 2024-02-154:49

Though I wonder how much of that is just that the format doesn’t encourage you to look closely enough at what you’re getting to see if it is right.

By startupsfail 2024-02-1423:20

There are several reasons to forget:

  - copilots are useful
  - chat is entertaining and useful
  - future tech is coming
  - investment money

By rozap 2024-02-150:16

This has been a perfect description of my experience doing this. I had written some code to go through reasonably complex web onboarding flows and it basically played out exactly like you predicted in your comment. In addition, I've been working with some vendors that have been trying to do the same thing and they're finding that it works out just like you describe.

The handwritten automations have performed better and the issues are reproducible, so even when there are issues, there's some sense of forward progress as you fix them. With handing it all over to an agent, it really feels like running around in circles.

I think there's probably something here, but it's less trivial than just tossing a webpage at chatGPT and hoping for the best.

By ianbicking 2024-02-1421:36

One interesting thing about LLMs is that they can actually recover (and without error loops). You can have a step that doesn't work right, and a later step can use its common-sense knowledge to ignore some of the missing results, conflicting information, etc. One of the problems with developing with LLMs is that the machine will often cover up bugs! You think it's giving sub-par results but actually you've given it conflicting or incomplete instructions.

Another opportunity is that you can have less steps or more shared context. One interesting thing about Whisper is that it's not just straight speech recognition but can also be prompted and given context to understand what sort of thing the speech may be about, increasing its accuracy considerably. LLM Vision models also do this with things like OCR. This might not help it with the individual digits in an account number, but it does help with distinguishing an account number from a street address on a check.

Or to take another old-style ML technique, you probably shouldn't be doing sentiment analysis in some pipeline, because you don't need to: instead you should step back and look at the purpose of the sentiment analysis and see if you can connect that purpose directly with the original text.

All that said, you definitely can write pipelines with compounding errors. We haven't collectively learned how to factor problems and engineer these systems with LLMs yet. Among the things I think we have to do is connect the tools more directly with user intention (effectively flatting another error-inducing part of the pipeline), and make the pipelines collaborative with users. This is more complex and distinctly not autonomous, but then hopefully you are addressing a broader problem or doing so in a more complete way.

By mFixman 2024-02-1420:462 reply

> You can do handwritten digit recognition with 90% accuracy? Sounds pretty good, but if you need to turn that into recognizing a 12 digit account number you now have a 70% chance of getting at least one digit incorrect.

You are assuming that the probability of failure is independent, which couldn't be further from the truth. If a digit recogniser can recognise one of your "hard" handwritten digits, such as a 4 or a 9, it will likely be able to recognise all of them.

The same happens with AI agents. They are not good at some tasks, but really really food at others.

By allanwind 2024-02-150:28

The "food" typo is just too good to ignore in this context.

By jhbadger 2024-02-1423:011 reply

And the US Post Office and other postal services have been using this tech to sort letters for several decades now (although postal codes with both letters and numbers like Canada's are harder). It was viewed as the "killer app" for ML in the 1990s.

By PheonixPharts 2024-02-150:07

This thread is an object lesson in the point I'm making: people have forgotten everything we've learned about making ML based products.

Parent comment doesn't understand the concept of expectation, and this comment is apparently unfamiliar with the fact that SotA for digit recognition [0] has been much higher than 90% even in the 90s. 90% accuracy for digit recognition is what you get if you use logistic regression as your model.

My point was that numbers that look good in terms of research often aren't close to go enough for the real world. It's not that 90% works for zip codes, it's that in the 90s accuracy was closer to 99%. You have validated my point rather than rejected it.

0. https://en.wikipedia.org/wiki/MNIST_database

By krallistic 2024-02-158:41

> The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has to be very high for it be productizable. You can do handwritten digit recognition with 90% accuracy?

It's way more nuanced than this. Of course, you need a decent "accuracy" (not necessarily the metric), but in many business cases, you don't need high accuracy. But you need a solid process: you can catch errors later, you can cross references etc, you need to failsafe, you need to have post-mortem error handling, etc...

I shipped stuff (classical ML) that was nothing more than "a biased coin flip," but that still generates value ($) due to the process around it.

By chenxi9649 2024-02-1420:585 reply

Yea that's a good point.

Now I am curious, what are some tasks that can accept a model that is at 80% as good as a human, but is 100x cheaper?(or, 100x faster?)

By tetha 2024-02-1422:385 reply

Similar to the sibling comment, helpdesk ticket routing.

The volume of helpdesk tickets large enterprises deal with is very easily and vastly underestimated. If you can even route 30% away from the central triage with 90+% accuracy and drop everything else back to the central triage... you suddenly safe 2 FTEs in that spot in some places. And increase customer satisfaction for most of those tickets because they get resolved faster.

Or, as much as people hate it, chatbots as a customer front. Yes, everyone here as an expert in a lot of tech has had terrible experiences with chatbots. Please mark your hate with the word "Lemon" in the comments. But decently implemented chatbots with a few systems behind them can resolve staggering amounts of simple problems from non-techies without human interaction from the company deploying them. It remains important to eventually escalate to humans - including the history from all of these interactions to avoid frustrations, sure.

Or, ticket/request preprocessing. Remember how spelling that 10 digit account number to a call center agent hard of hearing sucks? Those 4 retries because of you not using a better way to communicate that number also costs the company. Now, you can push a few of these retries into an AI system. If you mail them, an AI system can try to extract information like account numbers, intent, start of the problem, problem descriptions and such into dedicated fields to make the support agents faster.

Companies are certainly overdoing it at the moment, I'm not denying that. But a lot of the support/helpdesk pre-screening can be automated with current AI/ML capabilities very decently. Especially if you learn to recognize and navigate it.

By greenie_beans 2024-02-1513:21

and this is why consumers lose a bunch of money, so corporations can save $3-6 per help desk ticket. then consumers get stuck in a bot interface and can never break out of it (speaking from personal experience)

By Karrot_Kream 2024-02-1422:53

The last company I worked at eventually became a Big Tech. In the beginning though, we used to ask all engineers to pair with customer service folks to deal with ticket triage. When we got a bit bigger, we used to have rotations where eng would pair with customer service folks. Being on the other side of that was very eye-opening for all eng. Many used to come in with the same bias that you see on this site, that how dare you be routed to some automated service and how inhumane the service is. On the other side you see competent CS agents absolutely swamped with low-level questions that were often literally answered in docs and FAQ pages. I think getting transformer-based triage models correct can unlock tons of value.

By AJRF 2024-02-150:261 reply

Hard agree - i've heard frankly staggering "per support ticket" costs from every company i've worked for, or has publicly talked about customer support costs. Think $3-6 dollar per customer support ticket.

A UK company, Octopus has been doing some interesting work on GenAI <> customer support, which is helped by their "Energy provider in a box" software called Kraken (https://octopusenergy.group/kraken-technologies), which gives a single unified view over their operations.

They even have support agent level of personalisation - i.e; the agent will talk in the tone of voice of a given agent via fine-tuning of their chat history.

By allanwind 2024-02-150:41

You group tickets into root cause and the future projected cost will now fully fund fixing the issue. Most companies, however, look at customer support a cost center instead of valuable insight it is.

By allanwind 2024-02-150:38

I taught customer service / software engineers to process tickets from the singular queue and eliminated routing. Worked surprisingly well.

I have yet to see a chatbot in a customer service function that isn't strictly worse than a button. Usually, the button is "request refund / return" for whatever reason. It's like captcha stuff, web site owner is too dumb (no offense) to figure out how to handle spam so they offload that task onto the customer.

By Sabinus 2024-02-1423:09

Lemon

By vintermann 2024-02-157:031 reply

Well, an old one is OCR, especially handwritten OCR. I'm doing genealogy. There is SO MUCH old handwritten material that is never transcribed, and which requires special expertise to read (old and exotic handwriting styles) and interpret (place names, writing conventions, abbreviations).

It doesn't have to be perfect. It's not as if the actual data in there is perfect. It just has to be in a form where I can search it, ideally with named entities mapped.

Quality - like deciphering the writing on scrolls buried in volcanic ash in Herculaneum - gets all the attention. But what I really want is quantity - I want to be able to search through those 5000 pages of 200 year old mildly damaged cadastral records in dense handwriting. I want to relieve the army of kind retirees who currently transcribe these sorts of documents one by one based on their own needs.

By jazzyjackson 2024-02-1711:22

bruh what are the retirees supposed to do once you've automated their hobby

By Aerbil313 2024-02-1421:172 reply

A ton of tasks. Call centers to start with (they already do[1]), with human fallback.

1: In my country, after ChatGPT launched last year, when you call customer support you are now prompted to “just say in a few words” what you want instead of going through tap-this-number menus (they exist as a fallback) and I believe the backend is an LLM. The user flow and voice recordings are still programmatically determined though, but I can easily see one streamlined model calling APIs and whatnot, handling it all.

By salad-tycoon 2024-02-1421:351 reply

I speak English clearly. These things always tell me to repeat what I said. Never once has it ever worked for me. I want to throw my phone at the wall.

Also I think this has been around for longer than chatgpt. It is often accompanied by a fake keyboard clicking noise.

By sofixa 2024-02-1423:201 reply

> I speak English clearly. These things always tell me to repeat what I said. Never once has it ever worked for me. I want to throw my phone at the wall

Now imagine how well it works for people with non-"native" accents (even for native, I'd guess that a good Scouse/Glaswegian/Kiwi accent might confuse the hell of those systems as well). It's a disaster and I hate those.

By int_19h 2024-02-1519:41

There's a general problem across the tech industry with replacing existing simple and reliable (in a sense of conveying the user's intention) interfaces like physical buttons with stuff that is supposedly "more natural" like speech recognition or swipe gestures that in practice has a much higher error rate.

See also: replacing physical buttons with convoluted swipe gestures on mobile devices in the never-ending quest to make screen as large as possible. When was this ever a user ask?

I feel sometimes like the present UX design is one large LLM-like hallucination.

By chenxi9649 2024-02-1421:421 reply

Yea that's pretty cool too, I heard some restaurants are also doing a 100% voice LLM to take orders.

Transcription, specifically Whisper, is one of those ML models where the accuracy is basically on-par with humans. So, I really expect a lot more to come out of real time voice/LLM integrations.(the ChatGPT voice thing is a good glimpse, but very janky)

By allanwind 2024-02-150:46

"A firm providing AI drive-thru tech to fast food chains actually relies on human workers to take orders 70% of the time": https://www.businessinsider.com/ai-drive-thru-tech-relies-on...

By geoduck14 2024-02-1422:371 reply

Scan a menu, look for the different entrees, identify the most probable ingredients, determine health content. Then: allow people to search for food based allergies, food aversions, calories. Generate pictures of what the food might look like, display the pics next to the food to make it more likely a user will buy that food.

By charleslmunger 2024-02-1517:521 reply

Until one of your customers' children eats a peanut that the AI didn't infer would be an ingredient, and dies.

Generating fake pictures also seems like it would be more ordinary false advertising.

By red-iron-pine 2024-02-1519:45

also requires the resto or manufacturer to list all ingredients, and most already list potential allergens.

but yeah liability would scare me, esp. because without it you can put the liability squarely on the restaurant or in some cases the person ordering / asking / buying.

By usgroup 2024-02-157:41

I'm not sure this argument is in any way specific to LLMs, and the space for their application is still enormous. Search results, ad targeting, recommendation systems, anomaly detection, content flagging, and so on, are all systems using machine learning with a high false positive rate.

Up until fairly recently many systems used non-LLM models for making decisions based on natural language. Their performance would have been far worse but they still did useful work. Examples would include content policy enforcement, semantic search and so on.

There are very many cases where a system will make an automated decision on a heuristic or random basis for lack of better options. ML improved those decision points and spawned new ones. LLMs improve a subset of those decision points and spawn new ones.

By renonce 2024-02-169:091 reply

The last widely used AI tool was facial recognition, a technology widely used in fields such as company clock-ins, access control, surveillance, and more, and it is so trusted that facial recognition is often the sole method for clocking in. These facial recognition systems can maintain an extremely high accuracy rate for every entry and exit of thousands of people in a database every day. Now when will LLMs achieve such accuracy?

By bitnasty 2024-02-1712:301 reply

They have… they write language and they are good at it. The problem is language is not reality, proper language does not mean truth or fact. The models predict what is most likely to come next, not what is most likely to reflect reality.

By nomel 2024-02-1923:00

> what is most likely to come next

> not what is most likely to reflect reality.

Shouldn't there be a strong statistical correlation between the two? And, isn't that, fundamentally, more about intent of the training? If I train a model to predict what comes next in reality, it's through next word prediction, but it is predicting what reflects reality the best.

By jptoor 2024-02-151:15

You're 100% right - but I do think there are more lower accuracy cases than I initially expected, *especially* if you assume a human-in-the-loop. Still 10x better than status quo.

Ex. Content generation + zero-shot classification/mapping are powerful, and with a human in the loop (somewhat) responsible for accuracy, they can move much faster.

By greenie_beans 2024-02-1513:20

> There are cases where lower accuracy results are acceptable, but most people don't even consider this before embarking on their journey to build an AI product/agent.

what do you think would help people consider this before going down that path?

By Qwero 2024-02-1420:392 reply

I use already a few ai tools even without perfect accuracy.

And a LLM who only needs to call to a few API calls isn't hard.

Very little need perfect accuracy and for that we still have classical software.

By skywhopper 2024-02-1421:001 reply

You use them successfully because your human mind can filter out the junk. It would only take one inaccurate API call that charges your credit card $10k or sells your car for 10 cents to cause a lot of damage to your life.

By qeternity 2024-02-1421:341 reply

Which is why even with classical software, most of us don't have APIs where that's all it takes.

By ltengelis 2024-02-1615:47

Frankly I think you've said it all here - a properly designed API + a well designed LLM interface on top of that enables non-technical people to do things they otherwise couldn't.

By chenxi9649 2024-02-1420:581 reply

curious to know, which tools do you use and how do you use em?

By Qwero 2024-02-1512:09

I use copilot for coding.

CharGPT for writing emails to bigger audiences, Gramma correction.

Image generator for fun.

LLM for Feature Extraction from random text like a website or PDF (llama).

By LarsDu88 2024-02-1423:061 reply

The first think any ML practictioner realizes is that accuracy is about the single worst performance metric you can use for most real-world tasks, lol

By altdataseller 2024-02-1423:481 reply

Can you explain this? Why is that?

By LarsDu88 2024-02-156:41

Let me sell you an amazing cancer test my friend. It's 99.5 percent accurate.

It works through an incredibly novel mathematical technique. You simply go to the patient and tell them they don't have cancer.

Since 99.5 percent of people don't have cancer, this classifier is 99.5 percent accurate.

Completely bullshit classifier for a completely bullshit metric.

Use sensitivity/specificity or precision/sensitivity instead

By gremlinsinc 2024-02-154:25

Language is essential for human civilization, so are tools. We wouldn't get far without either.

maybe a language model can understand what it needs to do but not how to do it, so you give it a tool.

Humans can get pretty far without 100 percent accuracy, we can get a lot from AI models before they reach 100 percent, but being that at some point AI will be able to improve itself even remake itself daily with 2x the abilities, 100 percent or at least 99.7 percent is attainable.

Right now I can take any YouTube video summarize it and turn it into a podcast, short form videos, and a blog post.

There's definitely a lot of marketing uses right now for ai agents. If you think about embodied AI, it's only as good as it's body. if it doesn't have good grippers it will struggle to pick things up.

Also with a lot of things, accuracy is subjective one person might think ad copy is great and maybe their manager thinks it's shit. One person could give it a 100 percent score and another a 70 percent.

My point is we're so close here, and it's already amazing technology and we can augment failures by creating larger toolboxes.

By alexawarrior3 2024-02-1419:126 reply

None of these I've seen actually works in practice. Having used LLMs for software development the past year or so, even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work. I've got to go back and forth with the LLM to get anything useful and even then have to substantially modify it. I really hope there are some big advancements soon and this doesn't just collapse into another AI winter, but I can easily see this happening.

Some recent actual uses cases for me where an agent would NOT be able to help me although I really wish it would:

1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.

2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.

3. Illustrator Agent - Image generation for above story. Images end up very "LLM" looking, often miss key elements in the story, but one thing is worst of all: no persistent characters. This is already a big problem with text, but an even bigger problems with images. Every image for the same story has a character who looks different, but I want them to be the same.

4. Publisher Agent - Package things together above so I can get a complete package of illustrated stories on topics available on web/mobile for viewing, tracking progress, at varying levels.

Just some examples of where LLMs are currently not moving the needle much if at all.

By chenxi9649 2024-02-1421:02

>even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work

This is certainly true for more complex code generation. But there are a lot of "rote" work that I do use GPT to generate, and I feel like those have really improved my productivity.

The other use case for AI-assisted coding is that it _really_ helps me learn certain stuff. Whether it's a new language, or code that someone else wrote. Often times I know what I want done, but I don't know the corresponding utility functions in that language, and AI will not only be able to generate it for me but also through the process teach me about the existence of those things.(some of which are wrong lol, but it's correct enough for me to keep that behavior)

By okwhateverdude 2024-02-1423:25

> 2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.

You have to break it down into smaller steps and provide way more detail than you think you do in the context. I did an experiment in story generation where I had "authors" that would write only from the perspective of one of the characters that was also completely generated starting first from genre, name, character traits, etc. Then for a given scene, within a given plot and where in the story you are, randomly rotate between authors for each generation, appending it in memory, but not all of the story fits in context. And each generation is only a couple hundred tokens where you ask it to start/continue/end the story. The context contains all of this information in a simple key:value format. And essentially treat the LLM like a loom and spin the story out.

Usually what it produces isn't quite the best, but that's okay, because you can further refine the generation by using different system/user prompts explicitly for editing the content. I found that asking it to suggest one refinement and phrase it as a direct command, then feeding that command with the original generation, works. This meta-prompting tends to produce changes that subjectively improve the text according to whatever dimensions specified in the system prompt.

If you treat the composition as way more mechanical with tightly constrained generation, you get a much better, much more controlled result.

By Kerbonut 2024-02-153:261 reply

> 1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.

That’s because none of the models have been trained on this. Create a dataset for this and train a model to do it and it will be able to do it.

By carlossouza 2024-02-154:291 reply

https://www.youtube.com/watch?v=bRFLE9qi3t8

Here's the CEO of Builder.io supporting your comment: he says they tried LLMs/agents, and it didn't work. Then, they collected a dataset and developed an in-house model only to assist where they couldn't solve with imperative programming

By foolswisdom 2024-02-1522:18

Not really, he's saying that the solution is to not have the entire process in a single model, it's better to have the model work on specific pieces that you broke down, rather than feeding the whole thing and expecting the model to be able to break it down and generate correctly by itself.

By EVa5I7bHFq9mnYK 2024-02-169:18

One area that has been useful for me, is writing simple code in languages I am not familiar with, and not willing to learn. For example, I needed to write a small bash script to automate things in Ubuntu, it really saved me time on googling all those commands. Same with Task Scheduler XML language. It knows very well the popular use cases of all the languages.

By rpmisms 2024-02-150:12

Besides writing boilerplate, I used AI to generate a color scheme and imagery for a charity website I built.

By da4id 2024-02-153:462 reply

Why do you want it to generate web pages from images? I'm having trouble understanding the workflow here. You see a component you like on another website and want to obtain the code from it? Or if you have a design already, why not just use a Figma to Code tool?

By PeterisP 2024-02-158:29

It's not that uncommon to have a workflow where the webpage design gets built and negotiated with stakeholders/customers as a series of photoshop images, and when they're approved, it's forwarded to developers to make a pixel-perfect implementation of that design in HTML/CSS.

By gremlinsinc 2024-02-154:37

say you draw up your rough vision of things that you drew up paper, a very simple mock-up. That could be a nice use case.

By deathmonger5000 2024-02-1420:102 reply

I taught https://github.com/KillianLucas/open-interpreter how to use https://github.com/ferrislucas/promptr

Then I asked it to add a test suite to a rails side project. It created missing factories, corrected a broken test database configuration, and wrote tests for the classes and controllers that I asked it to.

I didn't have to get involved with mundane details. I did have to intervene here and there, but not much. The tests aren't the best in the world, but IMO they're adding value by at least covering the happy path. They're not as good as an experienced person would write.

I did spend a non-trivial amount of time fiddling with the prompts I used to teach OI about Promptr as well as the prompts I used to get it to successfully create the test suite.

The total cost was around $11 using GPT4 turbo.

I think in this case it was a fun experiment. I think in the future, this type of tooling will be ubiquitous.

By chenxi9649 2024-02-1421:221 reply

This is pretty cool!

Another use case where the cost of being slightly worse than a human is totally fine.(coming from someone that doesn't write tests lol)

I'd love to learn in more detail how it created those factories, corrected broken test database. It _feels_ that some of these tasks require knowing different parts of the codebase decently well, which from my experience hasn't always been the strong suite for AI assisted coding.

By deathmonger5000 2024-02-1421:58

OI fixed the factories and config by attempting to run the tests. The test run would fail because there's no test suite configured, so OI inspected the Gemfile using `cat`. Then it used Promptr with a prompt like "add the rspec gem to Gemfile". Then OI tries again and again - addressing each error as encountered until the test suite was up and running.

In the case of generating unit tests using Promptr, I have an "include" file that I include from every prompt. The "include" file is specific to the project that I'm using Promptr in. It says something like "This is a rails 7 app that serves as an API for an SPA front end. Use rspec for tests. etc. etc."

Somewhere in that "include" file there is a summary of the main entities of the codebase, so that every request has a general understanding of the main concepts that the codebase is dealing with. In the case of the rspec tests that it generated, I included the relevant files in the prompt by including the path to the files in the prompt I give to Promptr.

For example, if a test is for the Book model then I mention book.rb in the prompt. Perhaps Book uses some services in app/services - if that's relevant for the task then I'll include a glob of files using a command line argument - something like `promptr -p prompt.liquid app/services/book*.rb` where prompt.liquid has my prompt mentioning book.rb

You have to know what to include in the prompts and don't be shy about stuffing it full of files. It works until it doesn't, but I've been surprised at well it works in a lot of cases.

By rosspackard 2024-02-1420:292 reply

What do you mean when you use the word taught for open-interpreter?

Looking at the OI docs wasn't too helpful.

"I did spend a non-trivial amount of time fiddling with the prompts" was it writing prompts?

I am really interested and this seems like a cool use case that I want to explore. Could you share the prompts on a github gist?

By deathmonger5000 2024-02-163:55

Here's the fork of Open Interpreter that I was experimenting with: https://github.com/ferrislucas/open-interpreter/pull/1/files

The system prompt that adds the Promptr CLI tool is here: https://github.com/ferrislucas/open-interpreter/pull/1/files...

By deathmonger5000 2024-02-1421:03

I think I have the prompts still, but not on my work machine. I'll look tonight and edit this comment with whatever I can find.

I actually forked OI and baked in a prompt that was something like "Promptr is a CLI etc. etc., give Promptr conceptual instructions to make codebase and configuration changes". I think I put this in the system message that OI uses on every request to the OpenAI API.

Once I had OI using Promptr then I worked on a prompt for OI that was something like "create a test suite for the rails in ~/rails-app - use rspec, use this or that dependency, etc.".

Thanks for your interest! I'll try to add more details later.