[BlindMath] Turning math_scanner into a text layout checker
Rastislav Kish
rastislav.kish at protonmail.com
Fri Sep 1 03:16:26 UTC 2023
Hello everyone,
so, I've been working on a document where typesetting was quite
important. I've had a special layout in mind I wanted to achieve
including spatial text blocks, special alignments, images, etc.
Typst makes it very easy to design all sorts of imaginable things, but
even if the syntax is as great as it is, making a mistake,
misunderstanding something or simply not being sure how well will given
content arrange, I was in a need to check out how did my idea turn out.
So, I implemented few useful layout checking functions into
math_scanner, since it has already showed interesting results when it
comes to working with documents' graphics.
So, right now, after Tesseract processes the input image, you can:
* for any character, check the percentual distance to the left, right,
top and bottom edge of the image
* If you border a region of the image, you can check its percentual size
relative to the page (or column if you've made use of the program's
columnns splitting functionality)
* Check the size of any focused character in pixels. Note this may not
be always accurate, since the size is calculated from the bounding boxes
determined dby Tesseract.
Making use of these functions, you can easily check say whether a
heading is centered, how much vertical space is left in column while
writing the text, whether and how paragraphs are aligned, how big are
individual text blocks on the paper, or whether your figures were
aligned correctly as far as there is text around them you could use to
mark a region (note just horizontal / vertical borders are necessary for
determining the height / width, respectively).
math_scanner can split the input image into columns, which are
afterwards treated like standalone images (including rerecognition by
Tesseract, this can clear out a lot of clutter).
The new layout checking functions respect this mechanism too, so if you
have a multi-column document, you can review the layout in each of them
separately.
Indeed, it's still a good idea to have your work checked dby a sighted
reviewer, but it's still a difference to call someone for a check 5
times and 50 times, because you don't expect something, then change your
mind, rework, etc.
This particular implementation also has its limitations, namely the fact
it's run by OCR has few advantages, but also some significant downsides,
like recognition errors and the general unavareness of the program about
things like figures in the document.
It would be very interesting to implement something similar working
directly with information from PDF, since tools like Typst or LaTeX tend
to include them in somewhat semantic form, so it may be possible to get
very interesting results.
Right now however I don't think I quite have the time to study the
structure of PDF documents nor build a layout explorer from scratch, so
since math_scanner already had most of the prerequisites which were good
enough for my use-case, this is the optimal route for me at the time being.
I'm just letting people know in case someone was interested in my little
experiment.
You can find the new commits in the development branch of math_scanner:
https://github.com/RastislavKish/math_scanner
Still Linux only, the Windows branch is actually complete and
functional, but it yet has to be merged, since I didn't find anybody on
Windows willing/able to install Python and the necessary dependencies
for my little program. :)
Perhaps I should merge it though, the changes are subtle really, there
is very little to go wrong and even if it does, it can be solved when
someone notices.
I will take a look into it at some point.
Have fun
Best regards
Rastislav
More information about the BlindMath
mailing list