~olly/yoyo

1

Re: Sorting migrations from multiple sources

Kruis, Anselm <anselm.kruis@atos.net>
Details
Message ID
<09c43045fdf14024a0f2e905408ea41f@atos.net>
DKIM signature
missing
Download raw message
Hi Olly,

Many thanks for your reply and your patch. Please apologize my late reply, I was on vacation for during the last two weeks. Your patch works for me. It is indeed much simpler than mine. 

Another observation: the function topological_sort() might be to aggressive. Given I have three migrations in this order by ID:
"m1.sql", "m2.sql", "m3.sql" and "m3" depends on "m1"
Currently topological_sort() yields "m1", "m3", "m2". But the initial order "m1", "m2", "m3" already satisfies the dependency condition.

Of the various possible topological sorts we usually want the one, which is most similar to the original sequence of migrations. To define "most similar" one can use the Damerau-Levenshtein-distance or the Levenshtein-distance. See Oleksiy Gapotchenkos blog http://blog.gapotchenko.com/stable-topological-sort for an algorithm. If you are interested, I just coded a patch.

Kind regards
  Anselm

--
Anselm Kruis
Senior Solution Architect
MSE OPS HPCS S PS
T +49-89-35 63 86-874
M +49-16 05 82 82 00
anselm.kruis@atos.net
science + computing ag
Ingolstädter Str. 22
80807 München, Germany
atos.net/de/deutschland/sc

science + computing AG; Vorstand: Dr. Martin Matzke (Vorsitzender), Sabine Hohenstein, matthias Schempp; Vorsitzender des Aufsichtsrats: Philippe Robert Jonas Miltin; Sitz der Gesellschaft: Tübingen; Registergericht: Amtsgericht Stuttgart, HRB 382196.

Re: Sorting migrations from multiple sources

Details
Message ID
<20210523144057.xzbtg6wgfn6ipli3@hooper>
In-Reply-To
<09c43045fdf14024a0f2e905408ea41f@atos.net> (view parent)
DKIM signature
missing
Download raw message
Hi Anslem,

>Your patch works for me. It is indeed much simpler than mine.
>
Since writing that patch I rediscovered this: 
https://todo.sr.ht/~olly/yoyo/4

Because my patch breaks that earlier fix, I'd now prefer not to apply 
it.

>Another observation: the function topological_sort() might be to 
>aggressive. Given I have three migrations in this order by ID:
>"m1.sql", "m2.sql", "m3.sql" and "m3" depends on "m1"
>Currently topological_sort() yields "m1", "m3", "m2". But the initial 
>order "m1", "m2", "m3" already satisfies the dependency condition.
>
>Of the various possible topological sorts we usually want the one, 
>which is most similar to the original sequence of migrations. To define 
>"most similar" one can use the Damerau-Levenshtein-distance or the 
>Levenshtein-distance. See Oleksiy Gapotchenkos blog 
>http://blog.gapotchenko.com/stable-topological-sort for an algorithm. 
>If you are interested, I just coded a patch.
>
Thanks for the patch! I have applied it and taken some time to play 
around with it (see the 'gapotchenko' bookmark).

I am hesitant to merge it though. I am worried about the algorithm's 
performance, which looks to be roughly quadratic. For small numbers of 
migrations (<100) the difference is not noticeable, but beyond a couple 
of hundred it starts to slow down a lot. I have never used yoyo with 
such numbers of migrations, but it doesn't seem unreasonable that it 
should be able to cope with this.

I have tried designing a new algorithm which you can see under the 
'new-toposort-algo' bookmark. The general idea is to number the nodes 
according to their original input order, then at each iteration output 
the lowest numbered node that has all its dependencies satisfied. This 
seems to give better orderings than the current algorithm and has good 
performance. If you have the time to test it I'd be interested to know 
how well it works for your requirements.

Olly.
Reply to thread Export thread (mbox)