I am in banking and it's fine we have some finetuned models to work with our code base. I think COBOL is a good language for LLM use. It's verbose and English like syntax aligns naturally with the way language models process text. Can't complain.
Can you elaborate? See questions about what kind of use in sibling thread.
And in addition to the type of development you are doing in COBOL, I'm wondering if you also have used LLMs to port existing code to (say) Java, C# or whatever is current in (presumably) banking?
This is implied but I guess needs to be made explicit: people are looking for answers from devs with direct knowledge of the question at hand, not what random devs suspect.
More like the American timeline i.e. "that's the other party I don't like." There is this continual suspicion that if you criticise one lot you must support the other, as if there are only two shows in town.
I think it shows problems with your tests tbh. The bigger models are way more capable than you make them out to be. They are also better in training and understanding of CGI render outputs as reference like normal maps or id-masks. Your testing suite is the perfect example that structured data implies false confidence. Pure t2i is not a good benchmark anymore.
> The bigger models are way more capable than you make them out to be.
No test suite is ever going to be perfect. GenAI Showdown was started with the goal of focusing on a very narrow spectrum of testing (prompt adherence) because as a creator that's the one of the most interest to me.
> Pure t2i is not a good benchmark anymore
Just FYI Image Editing is already a separate benchmark (see the navbar at the top).
> Your testing suite is the perfect example that structured data implies false confidence
Again - the headline is "Specific prompts and challenges with a strong emphasis placed on adherence". If I tried to capture every possible aspect of GenAI models (multimodal, texture maps, periodic motion, tiling, etc) - I'd be at it until the heat death of the universe.
Incidentally - which model (specifically) do you think is ranked unfairly? While Flux.2 [dev] did only score a single point above ZiT, it's weighted score is much higher (1442 points vs 911 points).
I listened to it like 12 years through proxies and always hoped they would expand worldwide, they got more aggressive with their blocking so I ditched them. I really miss their radio stations.
This was the "Release name" for every NT version.
RCs were alpha quality, SP1 was the first beta, with SP2 things started to be ok. This was true until 7, then the Gates of hell opened and Windows is now an eternal "release" ( the thing between RC and SP1).
reply