Decisions, Decisions

SymPy
One of the major usability issues I'm finding with the current interface for the geometry module is the static methods that exist. In one way they really make sense since things are grouped together appropriately, but it's extra typing required and the ideas of concurrency and collinearity, for example, will [most likely] be obvious to those using the module. This brings me to a few options to take, and I'd love to hear any one's opinion:
  1. Leave everything as is so one would call, for example, Point.are_collinear() or LinearEntity.are_concurrent()
  2. Refactor all of these static methods outside of the class into a more "global" setting such as util.py so that class prefixes would not be required, hence separating control and structure (Very Model-View-Controller oriented design). Currently the control is in the classes and accesses the private data of each. This option would not access the private data since it will be external to the class itself. This is doable since all pertinent data is readily available through methods or property methods.
  3. Since all the functionality already exists in the classes, simply leave it there and wrap all of them in a more global setting so that less code has to be moved and very little written.

I personally like #2. The classes will become simpler, and I can hide more things. In terms of the latter statement, the reason I like #2 is because I'd like to hide the LinearEntity class as much as possible. It's more or less there to reduce a lot of code that would be rewritten, yet I don't really want it to be exposed. Right now this is the only thing I'm really looking at in terms of the interface, because I think it's a reasonable one. So, any thoughts or suggestions on those options, or maybe a better option that I haven't thought about?

Research
As for my research, things are coming along there too. I really wish I put a bit more extra effort in there since it is what I plan on doing in my future. Progress has been quite limited lately due to the fact that I'm entering the more difficult area of my problem for analysis. The problem is simply Largest Common Subgraph, and currently I'm only doing an analysis under the classical complexity theory (i.e., NP-Completeness). I'm gonna take a close look into a few of the "unknown" areas left and then hopefully start looking into a parameterized analysis of the problem. Parameterized analysis is a big thing for my supervisor, especially since his PhD supervisor was half of the team which developed the theory. I really like it; it's a very practical complexity theory.

What I'm hoping to do is more in-depth literature search, start BibTeX-ing my references and really focus on writing a paper out of this. With both a classical and parameterized approach, along with algorithms for some of the sub-problems that are poly-time, I'm hoping I can get a paper out of this. Maybe I'll also introduce the practicality of the problem with examples from chemistry (such as molecules, and similarity between them). If all goes well, maybe it can even become my honours thesis, or at least a strong foundation for my honours thesis.

9 comments:

Jason G said...

It appears that option #2 will be a lot more difficult than as I first thought. The dependency between functions is a little too tight for me to break it up, unless Python was capable of dealing with cyclic dependencies between scripts...

I'm going to look a little more deeply into it, an if it is going to be too much work or too much extra code has to be written, I think I will go with option #3.

Ondrej Certik said...

What exactly is the problem?

That you want to call

are_similar(t1,t2)

instead of

t1.are_similar(t1,t2)

(see the example below)?



Why are you making the methods static? Wouldn't it help just by making them normal methods? Then they could be used like this:

t1.are_similar(t2)

or

Triangle.are_similar(t1,t2)


Ondrej

Ondrej Certik said...

It doesn't want me to paste an example here:

Your HTML cannot be accepted: Tag is not allowed: ipython console

Ondrej Certik said...

In [1]: import sympy.modules.geometry as g

In [2]: p1 = g.Point(0, 0)

In [3]: p2 = g.Point(5, 0)

In [4]: p3 = g.Point(0, 5)

In [5]: t1 = g.Triangle(p1, p2, p3)

In [6]: t2 = g.Triangle(p1, p2, g.Point(Rational(5,2), sqrt(Rational(75,4))))
...:

In [7]: t2
Out[7]: Triangle(Point(0, 0), Point(5, 0), Point(5/2, 75/4**(1/2)))

In [8]: t1
Out[8]: Triangle(Point(0, 0), Point(5, 0), Point(0, 5))

In [9]: t1.ar
t1.are_similar t1.area

In [9]: t1.are_similar(t2)
---------------------------------------------------------------------------
exceptions.TypeError Traceback (most recent call last)

/home/ondra/sympy/ipython console

TypeError: are_similar() takes exactly 2 arguments (1 given)

In [10]: t1.are_similar(t1,t2)
Out[10]: False

Ondrej Certik said...

ok, I succeeded.

Jason G said...

I made them static because I felt like it actually reads a bit better. If I didn't force it to be used in a static way then it should almost be named is_similar() instead. A similar argument for other are_***() methods I have.

Do you think it would be better to not have them static?

Jim said...

I think you want them on the classes in case you hit something that cares about the class. (Example: I could imagine an is_similar method for rectangles or even polygons that might be different from your implementation for triangles.)

That said, I agree that is_similar(A, B) reads better than A.is_similar(B).

I think the way to handle that is with (what Python may eventually call) "generic functions". A good example is the builtin len.

len(x)

will end up calling x.__len__(), though there may be fallback logic later.

is_similar(A, B)

could, by default, do something like:

res=NotImplemented
try:
....res = A.is_similar(B)
except (AttributeError, TypeError):
....pass
if res is NotImplemented:
....try:
........res = B.is_similar(A)
....except (AttributeError, TypeError):
........pass
return res


-jJ

Jim said...

Note that I'm not saying to change the name between "are_similar" and "is_similar".

is_similar seems more intuitive, are_similar seems more correct (since the arguments are plural), neither seems right.

For the class-based implementation methods, you may want left- and right- hand versions (like __add__ and __radd__), though I hope you don't need them for similarity. You may also want to give them underscore names.

__similar__ would be right, exctpt that double-underscopre is supposedly reserved to the python implementation itself, or things aspiring to be part of the implementation.

_similar_ might be a least of evils, but I don't feel strongly.

similar would be correct for the generic function, but might be too broad. Is there a SymPy standard for predicates, similar to the common-lisp -p? (That is, in Lisp, I would name is similar-p.)

-jJ

Jason G said...

Yeah, that's probably what I'll do. I completely neglected the fact that similarity could/will be used elsewhere. What you have mentioned is sort of how I'm already doing my intersection.

My next step is to clean things up and fill in a few of the missing pieces.