TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

See original GitHub issue
from bertopic import BERTopic

docs = ["Hi i'm a doc", "i'm also a doc", "I'm a document", "this is an apple", "yet another topic"]

topic_model = BERTopic()
topics, _ = topic_model.fit_transform(docs)

Running BERTopic model… C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py:2213: UserWarning: n_neighbors is larger than the dataset size; truncating to X.shape[0] - 1 warn( C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\scipy\sparse\linalg\eigen\arpack\arpack.py:1593: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead. warnings.warn("k >= N for N * N square matrix. "

TypeError Traceback (most recent call last) <ipython-input-28-7dcfeabe3647> in <module> 1 print(‘Running BERTopic model…’) 2 # topics, _ = tqdm(topic_model.fit_transform(docs)) ----> 3 topics, _ = tqdm(topic_model.fit_transform(short_docs))

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\bertopic_bertopic.py in fit_transform(self, documents, embeddings) 279 280 # Reduce dimensionality with UMAP –> 281 umap_embeddings = self._reduce_dimensionality(embeddings) 282 283 # Cluster UMAP embeddings with HDBSCAN

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\bertopic_bertopic.py in _reduce_dimensionality(self, embeddings) 1101 low_memory=self.low_memory).fit(embeddings) 1102 else: -> 1103 self.umap_model.fit(embeddings) 1104 umap_embeddings = self.umap_model.transform(embeddings) 1105 logger.info(“Reduced dimensionality with UMAP”)

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py in fit(self, X, y) 2551 2552 if self.transform_mode == “embedding”: -> 2553 self.embedding_, aux_data = self._fit_embed_data( 2554 self._raw_data[index], n_epochs, init, random_state, # JH why raw data? 2555 )

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py in fit_embed_data(self, X, n_epochs, init, random_state) 2578 replaced by subclasses. 2579 “”" -> 2580 return simplicial_set_embedding( 2581 X, 2582 self.graph,

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py in simplicial_set_embedding(data, graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, densmap, densmap_kwds, output_dens, output_metric, output_metric_kwds, euclidean_output, parallel, verbose) 1052 elif isinstance(init, str) and init == “spectral”: 1053 # We add a little noise to avoid local minima for optimization to come -> 1054 initialisation = spectral_layout( 1055 data, 1056 graph,

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\spectral.py in spectral_layout(data, graph, dim, random_state, metric, metric_kwds) 325 try: 326 if L.shape[0] < 2000000: –> 327 eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh( 328 L, 329 k,

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\scipy\sparse\linalg\eigen\arpack\arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode) 1596 1597 if issparse(A): -> 1598 raise TypeError(“Cannot use scipy.linalg.eigh for sparse A with " 1599 “k >= N. Use scipy.linalg.eigh(A.toarray()) or” 1600 " reduce k.”)

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
MaartenGrcommented, Apr 18, 2022

Since duplicating documents helped, then it might be that you have too few documents to work with in order for UMAP to create lower-dimensional representations. You could use the duplication trick to train your model and then divide all frequencies by 2 to get the correct values.

0reactions
mararn1618commented, Apr 16, 2022

Hey Maarten, to clarify a misunderstanding: Duplicating the documents did actually help.

I’m using:

  • bertopic 0.9.4
  • sentence-transformers 2.2.0
  • umap-learn 0.5.2
  • hdbscan 0.8.28
  • scikit-learn 1.0.2
  • numpy 1.21.6
Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: Cannot use scipy.linalg.eigh for sparse A with k ...
Hello, I am using umap on Kaggle competition´s dataset: https://www.kaggle.com/c/dont-overfit-ii I have scaled the dataset and ran umap ...
Read more >
Applying matrix functions like scipy.linalg.eigh to higher ...
I want to efficiently calculate the eigenvals and vectors of each 3x3 array within this larger array. So far I have tried to...
Read more >
scipy.linalg.eigh — SciPy v1.9.3 Manual
Solve a standard or generalized eigenvalue problem for a complex Hermitian or real symmetric matrix. Find eigenvalues array w and optionally eigenvectors array ......
Read more >
scipy.sparse.linalg.eigs — SciPy v1.9.3 Manual
An array, sparse matrix, or LinearOperator representing the operation A @ x , where A is a real or complex square matrix. kint,...
Read more >
scipy.linalg.eig — SciPy v1.9.3 Manual
Solve an ordinary or generalized eigenvalue problem of a square matrix. Find eigenvalues w and right or left eigenvectors of a general matrix:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found