The idea is to create a tool (a “programming” language) for people who have to do repetitive text processing tasks often. The program will automate any text processing a user did manually. If a task is not repetitive and is done in a reasonable amount of time, then there is no need to automate it.
Inputs and outputs are mostly xlsx and docx files.
Program will be an almost natural-like language. Also I want to create GUI where user can select parts of text and process text with those selections (e.g. select name from one table, then spread the selection for the whole column, find these names in the other files and increment some corresponding column value in the file). This GUI won’t be described in detail here, only the language. All actions in GUI should have a counterpart in the language.
We have a table “clients” with columns name
(string),
visit
(integer). Each week we get a table “week N” with
columns name
(string) and organization
(string).
For each client where organization
in “week N” isn’t
equal to our_organization
, we need to increase visit in
“clients” by 1.
Here is an SQL-solution:
UPDATE clients c
SET c.visit = c.visit + 1 -- not sure it will work
FROM (
SELECT name
FROM weekN
WHERE organization <> "our_organization"
) AS weekly
WHERE c.name = weekly.name
Focus is specifically on tasks that involve 2 or more excel tables, that cannot be solved with Excel.
Having a text with many entries like
"<movieName1>, <movieName2>": <theaterName1>, <theaterName>...
for each date, create a table with three columns theater
,
movie
, day count
, where day count
equals maximum consecutive days that that movie was shown in a
particular theater.
My girlfriend had that task, she had to process 2.5 months of such data. So I wrote a program that does most of the work (input text for each day, get a table with results).
expand
and
shorten
to refactor such statmements into their long and
short counterparts.I want to have as few variables as possible, because I think it is a low-level construct that isn’t necessary. Most of the time, a solution to a text-processing problem can be described without explicit variables.
Task is described in weekly documents processing
For each row in "weekN" filter row's "organization" <> "our_organization":
If row's "name" exists in "output": visit add 1.
Compress
strings: avvvdkwwwqm -> av3dkw3qm
replace sequences (seq) of equal symbols with: seq[0] + seq size; here seq is a variable for a sequence
Check if number is palindrome (121 -> true, 123 -> false)
it reverse equal it
AI can solve the text processing problem in one of two ways:
Currently I mark AI-solutions as unreliable, but that can change quickly.
If you can contribute in any way (kononal@gmail.com):
I’ll be very grateful.
Recently I skimmed through a list of programming languages and was frustrated with how most of them look identical and do identical things (I was checking entries that might be close to what I’m trying to create, not all of them). I was hoping to draw inspiration from these languages. but the only interesting idea I found is in-code comments from Cognate. Also I found Icon language from Paul Graham’s post. I like the generators and condition expressions that return non-bool in Icon.↩︎