[NFBCS] Diffing and Office files (was Re: Managing Pull Requests on Github with Screenreader)

Doug Lee dgl at dlee.org
Wed Apr 1 21:55:49 UTC 2020


Pardon the tangent, but does anyone know if Git allows an external diff utility as bzr does, or a filter for each part of a diff? I ask because I wrote a couple of Python utilities a while back for comparing Office files, but
I never thought to share them because I thought of them as too tied to my setup. I could certainly let them out of my corner if there is a need. They currently both require Python 3.7.

The utilities, in source form not compiled into anything, are

docStream: Take an Office file and make a text stream of it. A -t flag allows for a rudimentary text content extraction, while normal behavior covers more than just text content.
Warning: If the file type of the given file is not recognized, docStream will try to send it through a utility called strings, which is available under Linux, Cygwin, etc. and is for extracting text from binary files.

docDiff: Given two Office files, generate diff output based on the docStream streams for them.
Requires a diff utility to be available to do the actual comparison and output generation.

On Wed, Apr 01, 2020 at 01:55:43PM -0700, NFBCS mailing list wrote:
	hello Tim.  The command is diff, That's d i f f.
What Joe is talking about is called a unified diff and is created when you
use the -u flag to diff.  For unified diffs, new lines begin with a "+"
sign and old lines begin with a "-" sign.  Unmarked lines are context
around the change so you can more easily figure out where the change is
happening.  There is also the diff -c flag, which stands for contextual
diff.  It's similar to the unified diff, but only shows the old lines if
they're modified, not if they're deleted.  If you haven't spent much time
with this command, I highly recommend that you do.  Using diff makes it
very easy to compare different files, especially large ones, and figure out
what's changed even when you don't think anything has.  In combination with
the sort and uniq commands, dif is an excellent way to detect differences
in directory listings, csv files, and a host of other kinds of data.  It's
also worth noting that diff, sort and uniq are just ordinary Unix/Linux
commands and in no way depend on git, though you can use them with git
tools.

Hope that helps.
-thanks
-Brian



_______________________________________________
NFBCS mailing list
NFBCS at nfbnet.org
http://nfbnet.org/mailman/listinfo/nfbcs_nfbnet.org
To unsubscribe, change your list options or get your account info for NFBCS:
http://nfbnet.org/mailman/options/nfbcs_nfbnet.org/dgl%40dlee.org

-- 
Doug Lee                 dgl at dlee.org                http://www.dlee.org
Level Access             doug.lee at LevelAccess.com    http://www.LevelAccess.com
"When there is no enemy within, the enemies outside cannot hurt you."
--African Proverb




More information about the NFBCS mailing list