scikit learn - Does sklearn.linear_model.LogisticRegression always converge to best solution? -
when using code, i've noticed converges unbelievably (small fraction of 1 second), when model and/or data large. suspect in cases not getting close best solution, hard prove. nice have option type of global optimizer such basin hopping algorithm, if consumed 100 1,000 times cpu. have thoughts on subject?
this complex question , answer might incomplete, should give hints (as question indicates knowledge gaps):
(1) first disagree desire
some type of global optimizer such basin hopping algorithm, if consumed 100 1,000 times cpu
not in cases (in ml world) differences subtle , optimization-error negligible compared other errors (model-power; empirical-risk)- read
"stochastic gradient descent tricks" (battou)
overview (and error-components!) - he gives important reason use fast approximate algorithms (not fit in case if 1000x training-time not problem): approximate optimization can achieve better expected risk because more training examples can processed during allowed time
- read
(2) basin-hopping of these highly heuristic tools of global-optimization (looking global-minima instead of local minima) without guarantees @ (touching np-hardness , co.). it's last algorithm want use here (see point (3))!
(3) the problem of logistic-regression convex optimization problem!
- the local minimum global-minimum, follows convexity (i'm ignoring stuff strictly/unique solutions , co)!
- therefore use tuned convex-optimization! , never basin-hopping!
(4) there different solvers , each support different variants of problems (different regularization , co.). don't know optimizing, of course these solvers working differently in regards convergence:
- take following comments grain of salt:
- liblinear: using cg-based algorithm (conjugated-gradient) means convergence highly dependent on data
- if accurate convergence achieved solely depending on exact implementation (liblinear high-quality)
- as it's first-order method call general accuracy medium
- sag/saga: seems have better convergence-theory (did not check much), again: it's dependent on data mentioned in sklearn's docs , if solutions accurate highly depending on implementation details
- as these first-order methods: general accuracy medium
- newton-cg: inexact newton-method
- in general more robust in terms of convergence line-searches replace heuristics or constant learning-rates (ls costly in first-order opt)
- second-order method inexact-core: expected accuracy: medium-high
- lbfgs: quasi-newton method
- again in general more robust in terms of convergence newton-cg
- second-order method: expected accuracy: medium-high
of course second-order methods more hurt large-scale data (even complexity-wise) , mentioned, not solvers supporting every logreg-optimization-problem supported in sklearn.
i hope idea how complex question (because of highly complex solver-internals).
most important things:
- logreg convex -> use solvers tuned unconstrained convex optimization
- if want medium-high accuracy: use second-order based methods available , many iterations (it's parameter)
- if want high accuracy: use second-order based methods more conservative/careful (no: hessian-approx; inverse-hessian-approx; truncating...):
- as expected: methods slow data.
wiki
Comments
Post a Comment