Morten Kloster <morklo@gmail.com> (moklo)
r1128857 | stefan2 | 2011-05-29 10:35:40 +0000 (Sun, 29 May 2011)
Simpler/faster LCS algorithm in libsvn_diff by elimination of idx.
* subversion/libsvn_diff/lcs.c
(svn_diff__snake): Removed idx parameter (always 0)
(svn_diff__lcs): Removed idx variable (always 0) , let d have either
sign, and moved the origo of fp by d. New version no longer chooses
the shorter file as the "original", and can thus give different LCS's
depending on the order of the input files even when the files have
different lengths. Depending on circumstances, speed can be ~15% faster
for core lcs algorithm.
Patch by: Morten Kloster <morklo{_AT_}gmail.com>
r1128862 | stefan2 | 2011-05-29 10:49:58 +0000 (Sun, 29 May 2011)
Describe the general idea behind the LCS algorithm in place.
* subversion/libsvn_diff/lcs.c
(svn_diff__lcs): add description
Patch by: Morten Kloster <morklo{_AT_}gmail.com>
r1128921 | stefan2 | 2011-05-29 18:04:50 +0000 (Sun, 29 May 2011)
Faster LCS algorithm in libsvn_diff by reworking fp argument.
This change benefits mainly the 32 bit VS compiler.
* subversion/libsvn_diff/lcs.c
(svn_diff__snake): expect sum of fp and k as a single argument;
replace int1+1 > int2 by int1 >= int2
(svn_diff__lcs): adapt caller
Patch by: Morten Kloster <morklo{_AT_}gmail.com>
(plus a small tweak by me)
r1132811 | jcorvel | 2011-06-06 22:34:41 +0000 (Mon, 06 Jun 2011)
Speed-up of libsvn_diff using token counts.
By using indices, not node pointers, to refer to tokens, and counting
the number of each token, the longest common subsequence (lcs)
algorithm at the heart of libsvn_diff becomes much faster in many
situations by skipping tokens that are unique to one source or the other.
* subversion/libsvn_diff/token.c
(svn_diff__node_t): Added 'index' member.
(svn_diff__tree_t): Added 'node_count' member.
(svn_diff__get_node_count): New function.
(tree_insert_token): Assigns indices to nodes.
(svn_diff__get_tokens): Uses token_index (svn_diff__token_index_t), not
node pointer, to identify node in position struct.
* subversion/libsvn_diff/lcs.c
(svn_diff__snake): Takes token counts as additional argument.
Skips tokens that are unique to one or the other file.
(svn_diff__lcs): Takes token counts and number of different tokens
as additional arguments. Calculates number of tokens unique to
each source and uses this to compute effective length difference.
* subversion/libsvn_diff/diff.h
(svn_diff__token_index_t): New type for token indices/counts.
(svn_diff__position_t): Replaced node pointer member by a
token_index (svn_diff__token_index_t).
Updated and added new function declarations.
* subversion/libsvn_diff/diff.c
* subversion/libsvn_diff/diff3.c
* subversion/libsvn_diff/diff4.c
(svn_diff_diff_2, svn_diff_diff_3, svn_diff_diff_4): Count the
number of times each token appears in each source.
(svn_diff__resolve_conflict): Takes number of different tokens
as additional argument. Counts the number of times
each token appears in each (partial) source.
(svn_diff__get_token_counts): New function.
Patch by: Morten Kloster <morklo{_AT_}gmail.com>
(some small whitespace fixes and docstring movements by me)
r1139725 | moklo | 2011-06-26 07:49:36 +0000 (Sun, 26 Jun 2011)
Whitespace adjustments
* subversion/libsvn_diff/diff3.c
Made first half of file match second half and norm
r1140195 | moklo | 2011-06-27 15:04:37 +0000 (Mon, 27 Jun 2011)
Creating new experimental branch for my work on lcs/diff/merge
r1140203 | moklo | 2011-06-27 15:16:16 +0000 (Mon, 27 Jun 2011)
* COMMITTERS: Add moklo as partial committer. (diff-improvements
branch).
r1140247 | moklo | 2011-06-27 17:41:10 +0000 (Mon, 27 Jun 2011)
Speeds up LCS algorithm for well-separated sections of change
By exploiting unique matches between the two files, it is possible to
"restart" the LCS algorithm at times and reset the buildup of square
computational complexity.
* subversion/libsvn_diff/lcs.c
(svn_diff__snake_t): Added uniquematchcount member.
(svn_diff__snake): Increments uniquematchcount each time
a matching token occurs only once in each file.
(clear_fp): New function that removes nonviable states.
(svn_diff__lcs): Restarts LCS algorithm from p=0 if the number
of unique matches exceeds the number of mismatches, using
the first state where that occurs as the new start state.